INDEX
Negative Logits
the
-0.10
and
-0.09
(
-0.09
["
-0.07
-0.07
Lor
-0.07
dues
-0.07
traits
-0.07
due
-0.07
brill
-0.07
POSITIVE LOGITS
fana
0.09
refused
0.08
ప్రేక్షక
0.08
услыш
0.08
guud
0.08
ищ
0.08
advice
0.08
comecei
0.08
chast
0.08
arrivée
0.08
Activations Density 0.000%