INDEX
Explanations
proper nouns, specifically names related to politics and leadership
mentions of specific individuals or characters
New Auto-Interp
Negative Logits
Whitman
-0.70
Fed
-0.68
Sed
-0.68
Plum
-0.65
Bullets
-0.65
Reviewer
-0.64
Spread
-0.64
REDACTED
-0.63
Harriet
-0.63
Sergeant
-0.63
POSITIVE LOGITS
emer
0.79
ymes
0.77
ippers
0.76
\\\\
0.76
agan
0.75
oÄŁ
0.74
ark
0.71
oos
0.70
pid
0.70
ille
0.69
Activations Density 0.032%