INDEX
Explanations
phrases related to different forms and contexts of abuse, including financial abuse and historical abuse allegations
mentions of abuse in various contexts
New Auto-Interp
Negative Logits
cil
-0.80
travel
-0.74
pard
-0.73
views
-0.71
izen
-0.70
compr
-0.68
cise
-0.67
vision
-0.66
ript
-0.66
pai
-0.65
POSITIVE LOGITS
abuse
0.93
inflicted
0.89
perpetrated
0.87
abusing
0.86
abuse
0.83
abused
0.77
abusers
0.76
victims
0.76
Abuse
0.74
abuses
0.73
Activations Density 0.037%