INDEX
Negative Logits
ILLE
-0.82
urses
-0.68
abama
-0.65
ITIES
-0.65
scill
-0.63
imation
-0.61
izabeth
-0.61
udder
-0.61
ught
-0.61
ITY
-0.60
POSITIVE LOGITS
Twain
1.25
eting
1.10
eters
1.09
Zuckerberg
1.08
ipl
0.99
down
0.98
manship
0.96
erness
0.92
owitz
0.91
edly
0.91
Activations Density 0.642%