INDEX
Explanations
proper nouns, likely related to political figures and events
instances of the verb "has."
New Auto-Interp
Negative Logits
Interested
-0.76
seless
-0.66
Researchers
-0.65
winner
-0.63
OOL
-0.62
selves
-0.60
Recap
-0.59
Adults
-0.57
Measure
-0.57
IMAGES
-0.57
POSITIVE LOGITS
been
1.37
vowed
1.16
apologized
1.14
campaigned
1.14
resigned
1.12
enegger
1.11
undergone
1.10
been
1.09
spoken
1.08
insisted
1.08
Activations Density 0.225%