INDEX
Negative Logits
IO
-0.07
��
-0.06
comprised
-0.06
dinner
-0.06
oir
-0.06
fu
-0.06
TRANSFER
-0.06
tranqu
-0.06
jekt
-0.06
^(
-0.06
POSITIVE LOGITS
,row
0.06
又
0.06
Anthrop
0.06
.ad
0.06
مسلمان
0.06
bütün
0.06
andard
0.06
feature
0.06
Of
0.06
제가
0.06
Activations Density 0.002%