INDEX
Explanations
the pronoun 'I'
pronouns that express personal sentiment
New Auto-Interp
Negative Logits
Uriel
-0.63
Sidney
-0.58
mutants
-0.58
Nora
-0.58
Gale
-0.57
illary
-0.57
cedes
-0.56
eworthy
-0.55
Alternative
-0.55
Violet
-0.55
POSITIVE LOGITS
'm
1.57
've
1.29
am
1.18
suppose
1.05
RL
1.04
'd
1.02
myself
0.97
'll
0.94
presume
0.94
ggy
0.91
Activations Density 0.219%