INDEX
Explanations
references to different forms of abuse, particularly with a focus on child and sexual abuse
mentions of various forms of abuse
New Auto-Interp
Negative Logits
pard
-0.70
amins
-0.70
Kinnikuman
-0.69
gran
-0.68
printed
-0.67
ebus
-0.67
puter
-0.66
Gork
-0.66
nee
-0.65
prints
-0.65
POSITIVE LOGITS
abuse
1.27
abuse
1.25
abusing
1.15
abused
1.14
abuses
1.10
Abuse
1.04
misuse
0.88
abusers
0.87
reatment
0.76
allegations
0.73
Activations Density 0.010%