INDEX
Explanations
personal pronouns and possessive pronouns referring to individuals or groups
references to personal pronouns or mentions of individuals
New Auto-Interp
Negative Logits
earch
-0.57
arsity
-0.55
atlantic
-0.51
Eleven
-0.50
idth
-0.48
noon
-0.48
inton
-0.48
Gulf
-0.47
Amin
-0.44
CCC
-0.44
POSITIVE LOGITS
'll
0.84
've
0.83
'd
0.81
're
0.78
self
0.75
own
0.75
knew
0.61
selves
0.61
can
0.58
cannot
0.58
Activations Density 0.958%