INDEX
Explanations
instances related to physical violence and abuse
terms associated with violence and abuse
New Auto-Interp
Negative Logits
soDeliveryDate
-0.96
soType
-0.92
isSpecialOrderable
-0.85
izen
-0.77
alg
-0.73
aer
-0.72
agall
-0.72
natureconservancy
-0.71
travel
-0.70
Balance
-0.69
POSITIVE LOGITS
abusive
0.92
unarmed
0.88
victims
0.83
handcuffed
0.80
abusers
0.79
interrog
0.78
punishments
0.78
violence
0.78
violently
0.77
pepper
0.77
Activations Density 0.193%