INDEX
Negative Logits
red
-0.07
arda
-0.07
言い
-0.07
Directory
-0.06
credibility
-0.06
bread
-0.06
φέρει
-0.06
ют
-0.06
odom
-0.06
roids
-0.06
POSITIVE LOGITS
illustrated
0.07
ni
0.06
(cert
0.06
neut
0.06
cancellation
0.06
život
0.06
/oct
0.06
nå
0.06
bapt
0.06
„N
0.06
Activations Density 0.027%