INDEX
Explanations
pronouns referring to the self or specific individuals
pronouns indicating the speaker or listener
New Auto-Interp
Negative Logits
jri
-0.81
aukee
-0.74
ibaba
-0.73
downs
-0.67
dden
-0.66
acular
-0.66
icist
-0.65
utions
-0.65
itcher
-0.65
isons
-0.64
POSITIVE LOGITS
'd
0.81
shouldn
0.81
could
0.81
've
0.80
'll
0.79
might
0.78
cannot
0.77
should
0.76
ought
0.76
conflic
0.74
Activations Density 0.305%