INDEX
Negative Logits
';↵↵↵↵
-0.09
-threatening
-0.08
_dem
-0.08
yearning
-0.08
thanks
-0.08
warning
-0.08
deb
-0.07
Где
-0.07
ВО
-0.07
ريت
-0.07
POSITIVE LOGITS
units
0.09
different
0.09
Suit
0.08
Lego
0.08
nějak
0.08
ranking
0.08
numeric
0.08
dolls
0.08
counts
0.08
productivity
0.08
Activations Density 0.042%