INDEX
Explanations
incidents of conflict or aggression
New Auto-Interp
Negative Logits
arse
-0.68
quished
-0.68
rontal
-0.67
raltar
-0.66
Cosponsors
-0.66
ovie
-0.63
ossibility
-0.62
pione
-0.62
osponsors
-0.60
roxy
-0.60
POSITIVE LOGITS
us
0.90
him
0.86
theirs
0.80
me
0.77
them
0.72
yours
0.69
hers
0.66
him
0.63
their
0.63
ours
0.62
Activations Density 0.355%