INDEX
Explanations
negations or forms of the word "not."
New Auto-Interp
Negative Logits
no
-0.15
es
-0.14
hop
-0.14
ej
-0.14
ein
-0.14
not
-0.13
erate
-0.13
hen
-0.13
niet
-0.13
Ñĥки
-0.13
POSITIVE LOGITS
necessarily
0.24
ori
0.20
anymore
0.19
ches
0.19
oriously
0.17
ched
0.17
quite
0.17
yet
0.16
tingham
0.16
rica
0.16
Activations Density 0.183%