INDEX
Negative Logits
quierda
-0.09
ponses
-0.09
spapers
-0.08
asured
-0.08
epen
-0.08
راسة
-0.08
šnja
-0.08
Mansion
-0.08
pon
-0.08
spaper
-0.08
POSITIVE LOGITS
》第
0.08
of
0.08
inadvertently
0.08
substantially
0.07
第二
0.07
Vorte
0.07
fundamentally
0.07
dramatically
0.07
multiplied
0.07
(&:
0.07
Activations Density 0.037%