INDEX
Explanations
quotations and dialogue tags
the pronoun "he" repeatedly used in various contexts
New Auto-Interp
Negative Logits
Maid
-0.67
fit
-0.63
BALL
-0.61
hindsight
-0.61
anking
-0.60
interfering
-0.60
rocket
-0.60
iries
-0.59
âϦ
-0.59
split
-0.58
POSITIVE LOGITS
'd
1.03
wrote
0.97
'll
0.93
lamented
0.86
said
0.85
joked
0.83
tweeted
0.82
mos
0.82
said
0.79
says
0.76
Activations Density 0.191%