INDEX
Explanations
negative expressions and statements of denial or doubt
New Auto-Interp
Negative Logits
ually
-0.15
Rage
-0.14
pons
-0.14
utta
-0.14
hari
-0.14
adem
-0.14
Rated
-0.14
isini
-0.14
ú
-0.13
abox
-0.13
POSITIVE LOGITS
any
0.20
ä»»ä½ķ
0.20
Any
0.16
_any
0.16
oÅĽci
0.15
rocket
0.15
anymore
0.15
اÙĦØŃÙĬاة
0.15
emand
0.15
Any
0.15
Activations Density 0.313%