INDEX
Negative Logits
كرة
-0.08
.birth
-0.08
fus
-0.08
inciso
-0.08
tighter
-0.07
Microwave
-0.07
ímp
-0.07
Mathemat
-0.07
Horton
-0.07
Є
-0.07
POSITIVE LOGITS
refers
0.09
plainly
0.08
ള്ളി
0.08
سالن
0.08
oured
0.08
רח
0.07
നെ
0.07
экземпля
0.07
Horiz
0.07
oned
0.07
Activations Density 0.002%