INDEX
Explanations
negative statements or terms that imply negation
New Auto-Interp
Negative Logits
ockey
-0.16
orry
-0.15
bung
-0.15
icom
-0.14
IPS
-0.14
ection
-0.14
Markus
-0.14
ῦ
-0.14
_IMPLEMENT
-0.14
phy
-0.14
POSITIVE LOGITS
Äįer
0.16
zek
0.16
ÑģÑĤиÑĤ
0.15
ovol
0.14
anz
0.14
eto
0.13
اÙĦØ¥ÙĨجÙĦÙĬزÙĬØ©
0.13
tid
0.13
rag
0.13
Ùĥر
0.13
Activations Density 0.000%