INDEX
Negative Logits
handguns
-0.07
에
-0.07
yorum
-0.06
پول
-0.06
nThe
-0.06
configured
-0.06
_SB
-0.06
اسه
-0.06
시에
-0.06
english
-0.06
POSITIVE LOGITS
ugl
0.08
logos
0.08
tonic
0.06
-bar
0.06
usuarios
0.06
[word
0.06
!).
0.06
barber
0.06
Gover
0.06
Riding
0.06
Activations Density 0.005%