INDEX
Explanations
occurrences of the pronoun "I"
New Auto-Interp
Negative Logits
pedia
-0.18
INCLUDED
-0.17
a
-0.16
aling
-0.16
p
-0.16
e
-0.16
vro
-0.15
pars
-0.15
áºŃn
-0.15
onym
-0.15
POSITIVE LOGITS
.e
0.23
E
0.20
L
0.19
omanip
0.18
G
0.17
M
0.17
ylland
0.17
F
0.17
N
0.17
D
0.16
Activations Density 0.044%