INDEX
Negative Logits
verages
-0.09
inus
-0.08
destruct
-0.08
們
-0.07
Uses
-0.07
circ
-0.07
reinc
-0.07
according
-0.07
leaning
-0.07
والأس
-0.07
POSITIVE LOGITS
XYZ
0.10
./
0.09
HK
0.09
~/.
0.09
కమ
0.08
Command
0.08
\<
0.08
XYZ
0.08
HT
0.08
RO
0.08
Activations Density 0.016%