INDEX
Explanations
occurrences of the pronoun "they."
New Auto-Interp
Negative Logits
itself
-0.28
was
-0.21
isnt
-0.16
less
-0.16
ial
-0.16
st
-0.15
(es
-0.15
ly
-0.15
ther
-0.15
isn
-0.15
POSITIVE LOGITS
’re
0.45
're
0.40
are
0.40
themselves
0.38
were
0.34
've
0.33
’ve
0.32
aren
0.28
'll
0.28
’ll
0.27
Activations Density 0.214%