INDEX
Explanations
pronouns referring to individuals
references to pronouns indicating personal relationships and interactions
New Auto-Interp
Negative Logits
Pens
-0.66
ucc
-0.61
Glob
-0.60
UTC
-0.60
Esk
-0.60
Mulcair
-0.59
Nikol
-0.58
limits
-0.58
Jindal
-0.57
hide
-0.56
POSITIVE LOGITS
own
0.89
selves
0.80
oneliness
0.76
reditary
0.76
abba
0.74
self
0.72
pherd
0.71
essage
0.71
OWN
0.70
heter
0.69
Activations Density 0.493%