INDEX
Explanations
the pronoun "I" followed by various verbs, thoughts, and actions
references to the first-person perspective
New Auto-Interp
Negative Logits
Rolls
-0.66
heads
-0.63
Emin
-0.63
eworthy
-0.58
Alternative
-0.57
iquette
-0.57
Chaser
-0.57
pires
-0.57
optics
-0.57
Canaver
-0.56
POSITIVE LOGITS
'm
1.80
am
1.39
've
1.30
myself
1.21
ggy
1.11
RL
1.08
verson
1.07
'd
1.03
'll
1.00
zzo
0.97
Activations Density 0.280%