INDEX
Explanations
phrases mentioning specific names, possibly related to leadership or prominence
occurrences of the letter 'e'
New Auto-Interp
Negative Logits
ategory
-0.79
anova
-0.73
sidx
-0.71
rooms
-0.71
irtual
-0.70
iets
-0.70
NESS
-0.68
milo
-0.68
glim
-0.68
antha
-0.67
POSITIVE LOGITS
lements
1.37
gger
1.11
cki
1.06
gypt
1.04
zza
1.03
ck
1.03
agle
1.03
lev
1.02
agles
1.00
ld
1.00
Activations Density 0.035%