INDEX
Explanations
violent actions or mentions of physical altercations
mentions of law enforcement and violence
New Auto-Interp
Negative Logits
ISTORY
-0.69
Quantity
-0.63
zsche
-0.60
renaissance
-0.59
breeding
-0.59
vi
-0.59
cend
-0.58
yourselves
-0.57
ILCS
-0.56
phas
-0.56
POSITIVE LOGITS
selves
0.78
inappropriately
0.78
soever
0.77
itch
0.76
itals
0.75
goodbye
0.73
via
0.69
senseless
0.69
alike
0.68
unconscious
0.67
Activations Density 0.773%