INDEX
Negative Logits
弗
-0.07
Regular
-0.07
inadvertently
-0.07
_instruction
-0.06
sinus
-0.06
.Keyboard
-0.06
_RG
-0.06
Vš
-0.06
放
-0.06
Kum
-0.06
POSITIVE LOGITS
injuring
0.06
mina
0.06
itelist
0.06
atisfied
0.06
holy
0.06
网址
0.06
ESTAMP
0.06
brat
0.05
.usage
0.05
луата
0.05
Activations Density 0.002%