INDEX
Explanations
references to the self and personal thoughts, often starting with "I"
instances of the word "I," indicating a focus on personal expression and self-reference
New Auto-Interp
Negative Logits
rising
-0.63
marg
-0.61
arians
-0.59
Uriel
-0.59
Jarrett
-0.58
Aberdeen
-0.56
Pearson
-0.55
Vald
-0.55
Walton
-0.55
Rubio
-0.54
POSITIVE LOGITS
'm
1.56
've
1.31
'll
1.13
RL
1.04
'd
1.02
verson
1.02
am
0.99
ggy
0.97
suppose
0.96
KE
0.94
Activations Density 0.281%