INDEX
Explanations
statements about physical abuse
New Auto-Interp
Negative Logits
majority
-0.61
°
-0.60
ories
-0.59
ģĸ
-0.58
Coalition
-0.58
Siege
-0.56
heny
-0.55
profits
-0.55
Centers
-0.54
ĨĴ
-0.53
POSITIVE LOGITS
selves
1.10
atic
1.09
atically
1.08
self
1.01
personally
0.97
lees
0.91
adows
0.91
selves
0.85
andering
0.83
soever
0.82
Activations Density 3.208%