INDEX
Negative Logits
명
-0.08
명을
-0.08
명이
-0.08
=sc
-0.07
Eins
-0.07
Jacobs
-0.07
horn
-0.07
plugged
-0.07
pops
-0.07
हुन्छ
-0.07
POSITIVE LOGITS
lest
0.10
undes
0.09
чрез
0.09
undes
0.09
autant
0.09
-too
0.09
undesirable
0.08
inadvert
0.08
تعمیر
0.08
undue
0.08
Activations Density 0.063%