INDEX
Negative Logits
Во
-0.09
�
-0.08
Graham
-0.08
Brick
-0.08
mam
-0.08
Sabb
-0.08
Rover
-0.08
Во
-0.07
Pamp
-0.07
rouges
-0.07
POSITIVE LOGITS
ments
0.08
え
0.08
effort
0.07
搭
0.07
prof
0.07
holding
0.07
carrier
0.07
update
0.07
Clem
0.07
Holy
0.07
Activations Density 0.005%