INDEX
Explanations
first-person singular pronouns and verbs indicating thought or speculation
pronouns that indicate self-reference, particularly "I."
New Auto-Interp
Negative Logits
Walton
-0.64
ources
-0.63
rising
-0.62
Clancy
-0.61
tions
-0.61
Uriel
-0.61
Pearson
-0.60
Roose
-0.59
psons
-0.59
Thrones
-0.59
POSITIVE LOGITS
'm
1.49
am
1.05
've
1.02
verson
0.99
RL
0.96
ANA
0.93
'll
0.91
MAX
0.90
guess
0.88
myself
0.87
Activations Density 0.171%