INDEX
Explanations
the third-person object pronoun “them.”
New Auto-Interp
Negative Logits
a
-0.12
A
-0.10
Nicola
-0.09
а
-0.08
A
-0.08
te
-0.08
ar
-0.08
ro
-0.08
e
-0.08
la
-0.08
POSITIVE LOGITS
them
0.16
him
0.15
Him
0.13
HIM
0.11
Them
0.11
THEM
0.10
him
0.10
ham
0.09
emen
0.09
Hem
0.09
Activations Density 0.082%