INDEX
Explanations
expressions of negation or denial
New Auto-Interp
Negative Logits
ei
-0.20
y
-0.19
a
-0.19
eid
-0.19
e
-0.17
c
-0.16
ern
-0.16
o
-0.16
à¸Ľà¸£à¸°à¸¡à¸²à¸ĵ
-0.15
ÛĮ
-0.15
POSITIVE LOGITS
etwork
0.20
_REF
0.17
’t
0.17
mue
0.17
naire
0.16
't
0.16
ouncements
0.15
atural
0.15
iqu
0.15
avigate
0.15
Activations Density 0.184%