INDEX
Explanations
references to domestic violence and related issues
New Auto-Interp
Negative Logits
unct
-0.16
irus
-0.16
herit
-0.16
guid
-0.15
nat
-0.15
arkin
-0.15
prism
-0.15
ansom
-0.14
remen
-0.14
ZD
-0.14
POSITIVE LOGITS
Domestic
0.32
domestic
0.30
Violence
0.28
violence
0.27
/dom
0.25
Dom
0.25
.Dom
0.24
batter
0.23
domest
0.22
IPV
0.22
Activations Density 0.063%