INDEX
Explanations
personal pronouns followed by verbs related to actions or states
pronouns indicating possession or performative actions
New Auto-Interp
Negative Logits
jri
-0.80
amaz
-0.70
itive
-0.68
aukee
-0.66
classic
-0.63
ibaba
-0.63
ieties
-0.62
Atkinson
-0.62
į
-0.61
downs
-0.61
POSITIVE LOGITS
'll
1.04
'd
1.02
could
0.96
're
0.93
've
0.92
should
0.90
ought
0.90
shouldn
0.88
cannot
0.87
might
0.85
Activations Density 0.261%