INDEX
Explanations
pronouns related to the audience or the speaker
references to the collective "us."
New Auto-Interp
Negative Logits
lets
-0.64
fect
-0.64
tein
-0.61
CPC
-0.58
ussen
-0.57
livest
-0.57
Offic
-0.56
MAC
-0.55
stick
-0.55
chaired
-0.55
POSITIVE LOGITS
selves
1.20
hers
1.16
ern
0.91
urers
0.89
aning
0.89
ourselves
0.88
selves
0.85
ury
0.79
urious
0.79
leep
0.78
Activations Density 0.062%