INDEX
Explanations
words related to domestic violence
references to domestic violence and related issues
New Auto-Interp
Negative Logits
Reviewer
-0.83
peror
-0.80
*/(
-0.78
travel
-0.73
gob
-0.73
source
-0.71
Label
-0.71
aer
-0.70
çĦ
-0.70
oyal
-0.69
POSITIVE LOGITS
violence
0.93
restraining
0.89
prevention
0.88
Violence
0.84
homicides
0.84
suff
0.81
abuse
0.79
stigma
0.76
Viol
0.76
violence
0.76
Activations Density 0.033%