INDEX
Explanations
phrases or structures indicating statements or opinions
New Auto-Interp
Negative Logits
consolidated
-0.15
Nagar
-0.15
sake
-0.14
elivery
-0.14
IGHL
-0.14
446
-0.14
nor
-0.14
ediator
-0.13
ULO
-0.13
_$_
-0.13
POSITIVE LOGITS
rief
0.20
té
0.17
lue
0.16
гл
0.15
ħ§
0.15
ulle
0.15
chy
0.14
ANGLES
0.14
ewidth
0.14
serie
0.14
Activations Density 0.019%