INDEX
Explanations
general instances of violence, specifically mentioning stabbing incidents
references to violent acts involving stabbing
New Auto-Interp
Negative Logits
quickShipAvailable
-0.74
VO
-0.71
gran
-0.70
aut
-0.69
avez
-0.66
uph
-0.65
mberg
-0.65
Moder
-0.64
mie
-0.64
oS
-0.64
POSITIVE LOGITS
stabbed
1.04
stabbing
1.00
stab
0.87
nesday
0.85
dagger
0.84
slit
0.81
spree
0.81
lished
0.80
wounds
0.80
knife
0.78
Activations Density 0.006%