INDEX
Negative Logits
urally
-0.83
sequ
-0.67
mund
-0.66
naires
-0.65
oxide
-0.64
Inquis
-0.64
aver
-0.64
atmosp
-0.63
mistaken
-0.62
leukemia
-0.60
POSITIVE LOGITS
la
1.20
leigh
1.20
den
1.11
von
1.09
lee
1.01
aking
1.00
enne
0.99
lan
0.99
ak
0.97
nor
0.96
Activations Density 0.023%