INDEX
Explanations
words related to violence and criminal activities
discussions of violence against marginalized groups, particularly women
New Auto-Interp
Negative Logits
answer
-0.85
terday
-0.81
rar
-0.79
DragonMagazine
-0.78
xon
-0.74
ername
-0.74
fen
-0.73
uyomi
-0.73
forward
-0.72
senal
-0.71
POSITIVE LOGITS
minors
1.50
humans
1.36
individuals
1.35
juveniles
1.35
adults
1.35
adolescents
1.33
females
1.33
persons
1.33
infants
1.32
minorities
1.32
Activations Density 0.361%