INDEX
Explanations
terms related to abuse
references to instances of abuse
New Auto-Interp
Negative Logits
cil
-0.79
travel
-0.74
izen
-0.71
pard
-0.71
views
-0.70
ript
-0.67
vision
-0.66
aries
-0.65
pai
-0.65
soType
-0.64
POSITIVE LOGITS
perpetrated
0.97
inflicted
0.95
abuse
0.95
abuse
0.88
abusing
0.84
abused
0.80
victims
0.79
abusers
0.77
survivors
0.77
abuses
0.73
Activations Density 0.052%