INDEX
Explanations
pronouns related to personal identity
occurrences of the word "themselves."
New Auto-Interp
Negative Logits
amia
-0.73
ulton
-0.72
Sierra
-0.72
ammy
-0.71
grade
-0.69
pour
-0.67
yson
-0.67
Fulton
-0.67
emis
-0.66
MID
-0.64
POSITIVE LOGITS
selves
1.10
tremend
0.95
selves
0.88
self
0.85
underwater
0.78
conduc
0.78
ashamed
0.76
submar
0.75
proport
0.75
creatively
0.74
Activations Density 0.039%