INDEX
Explanations
the pronoun "I" at a high activation level
first-person singular pronouns
New Auto-Interp
Negative Logits
rules
-0.62
apex
-0.62
excess
-0.62
Rapids
-0.61
Ħ¢
-0.60
Clair
-0.60
ses
-0.60
Shelby
-0.58
Georgetown
-0.56
Slav
-0.56
POSITIVE LOGITS
'm
1.35
've
1.18
nex
1.14
EEE
1.06
suppose
1.06
'll
1.03
stanbul
1.03
dunno
1.03
WI
1.02
zzy
0.98
Activations Density 0.141%