INDEX
Explanations
mentions of different forms of abuse
references to instances of abuse
New Auto-Interp
Negative Logits
izen
-0.79
travel
-0.77
cil
-0.72
pard
-0.72
views
-0.68
soType
-0.68
ebus
-0.67
compr
-0.65
Towns
-0.64
zig
-0.62
POSITIVE LOGITS
abuse
1.05
perpetrated
1.04
inflicted
0.99
abuse
0.97
victims
0.93
abusing
0.93
survivors
0.90
abusers
0.89
allegations
0.86
abuses
0.85
Activations Density 0.042%