INDEX
Explanations
names of politicians and political figures
proper nouns, particularly names and titles associated with individuals or entities
New Auto-Interp
Negative Logits
nesday
-0.72
luster
-0.61
glers
-0.60
ancial
-0.60
eatures
-0.60
aukee
-0.57
rul
-0.56
wcs
-0.56
ail
-0.56
remem
-0.55
POSITIVE LOGITS
ansas
0.79
inian
0.74
ondo
0.73
itect
0.69
eta
0.67
rary
0.66
Fatal
0.65
Spoiler
0.65
Schwarzenegger
0.64
eli
0.64
Activations Density 0.071%