INDEX
Negative Logits
Head
-0.08
(permission
-0.07
בחיים
-0.07
xca
-0.07
[_
-0.07
Hop
-0.07
exposures
-0.06
computed
-0.06
Sure
-0.06
gamma
-0.06
POSITIVE LOGITS
Violence
0.08
lob
0.08
ilingual
0.07
type
0.07
PLAYER
0.07
tirelessly
0.07
địa
0.07
ласт
0.07
丝绸
0.07
bol
0.07
Activations Density 0.004%