INDEX
Explanations
phrases related to various forms of abuse and misconduct
instances of various forms of abuse and violence
New Auto-Interp
Negative Logits
zzo
-0.82
gue
-0.82
imester
-0.80
eport
-0.78
uve
-0.77
gio
-0.76
cade
-0.75
abase
-0.73
plates
-0.73
baum
-0.72
POSITIVE LOGITS
vandalism
1.26
criminality
1.25
violence
1.24
persecution
1.22
intimidation
1.21
harassment
1.21
brutality
1.18
oppression
1.18
injustice
1.18
exploitation
1.17
Activations Density 0.317%