INDEX
Explanations
words indicating certainty or negation
New Auto-Interp
Negative Logits
ww
-0.17
alsy
-0.17
ixin
-0.15
aint
-0.15
Âłje
-0.15
ă
-0.15
baugh
-0.15
äd
-0.14
650
-0.14
rokes
-0.14
POSITIVE LOGITS
plevel
0.15
hadn
0.15
acas
0.15
Saunders
0.14
_almost
0.14
acman
0.14
celib
0.13
ãĥªãĤ«
0.13
گراÙĨ
0.13
gratuita
0.13
Activations Density 0.000%