INDEX
Explanations
occurrences of personal pronouns and expressions of personal identity
New Auto-Interp
Negative Logits
screamed
-0.23
scream
-0.21
screams
-0.21
cries
-0.19
screaming
-0.19
blinked
-0.19
cry
-0.18
shouted
-0.18
yelled
-0.17
cried
-0.17
POSITIVE LOGITS
purs
0.24
gest
0.20
cock
0.19
drum
0.19
incl
0.19
incl
0.18
hm
0.18
straight
0.17
eyeb
0.17
nod
0.17
Activations Density 0.462%