INDEX
Explanations
pronouns referring to a specific person (he, him)
repetitive references to the pronoun "he" in various contexts
New Auto-Interp
Negative Logits
earch
-0.80
htaking
-0.77
tones
-0.67
Mandatory
-0.66
anking
-0.66
entary
-0.65
Nationwide
-0.65
International
-0.64
umption
-0.64
Observatory
-0.64
POSITIVE LOGITS
'd
1.23
'll
1.14
knew
1.09
knows
1.04
thinks
1.03
eded
0.94
swore
0.94
hates
0.93
ctic
0.92
remembers
0.92
Activations Density 0.303%