INDEX
Explanations
phrases and words that indicate analysis or critical evaluation of situations
New Auto-Interp
Negative Logits
habi
-0.15
rane
-0.15
seys
-0.14
erland
-0.14
ral
-0.14
rani
-0.14
видÑĥ
-0.14
OTHERWISE
-0.14
.obtain
-0.14
εί
-0.13
POSITIVE LOGITS
again
0.60
again
0.52
Again
0.51
Again
0.48
AGAIN
0.41
åıĪ
0.40
AGAIN
0.39
_again
0.37
wieder
0.35
novamente
0.35
Activations Density 0.025%