INDEX
Explanations
instances of the pronoun "I" in various contexts
New Auto-Interp
Negative Logits
ve
-0.31
t
-0.28
an
-0.28
l
-0.27
ke
-0.26
f
-0.24
r
-0.24
m
-0.24
n
-0.24
d
-0.23
POSITIVE LOGITS
TERS
0.17
cntl
0.16
i
0.16
iÃŃ
0.16
mit
0.16
ãĤ¦ãĥ³
0.15
eee
0.15
AU
0.15
udic
0.15
ADE
0.15
Activations Density 0.053%