INDEX
Negative Logits
Birth
-0.07
insulting
-0.07
—who
-0.07
—that
-0.07
*-
-0.07
theater
-0.07
collaborations
-0.06
Sheldon
-0.06
)”
-0.06
sincerity
-0.06
POSITIVE LOGITS
našeho
0.07
(mean
0.07
ному
0.06
(reordered
0.06
мож
0.06
busy
0.06
اهد
0.06
lrt
0.06
zástup
0.06
i
0.06
Activations Density 0.102%