INDEX
Explanations
mentions of physical violence or attacks
terms related to various forms of assault and aggressive behavior
New Auto-Interp
Negative Logits
FORE
-0.76
ãĤ©
-0.76
snipp
-0.72
views
-0.68
Solitaire
-0.68
ocular
-0.67
overed
-0.66
CFR
-0.66
iku
-0.65
bard
-0.64
POSITIVE LOGITS
iveness
0.89
ive
0.85
uous
0.84
assault
0.84
perpetrated
0.82
ively
0.81
quez
0.79
victim
0.77
ingly
0.75
leveled
0.74
Activations Density 0.034%