INDEX
Explanations
expressions indicating denial or negation
New Auto-Interp
Negative Logits
dip
-0.17
forg
-0.17
ugas
-0.16
emap
-0.16
Dip
-0.15
hem
-0.15
Alam
-0.14
Gang
-0.14
nice
-0.14
onz
-0.14
POSITIVE LOGITS
ónico
0.18
nodoc
0.17
iker
0.15
è¶Ĭ
0.15
FINITE
0.15
DownList
0.15
iotic
0.14
icer
0.14
éļł
0.14
iday
0.14
Activations Density 0.026%