INDEX
Explanations
phrases related to physical harm, especially stabbing
references to stabbing incidents or related injuries
New Auto-Interp
Negative Logits
Cosmos
-0.75
oS
-0.72
mberg
-0.70
Geographic
-0.69
XM
-0.68
Afric
-0.64
Organization
-0.64
AMA
-0.63
Leban
-0.63
VO
-0.63
POSITIVE LOGITS
wounds
1.05
lished
0.98
slit
0.89
throats
0.84
nery
0.82
rampage
0.82
stabbing
0.81
stab
0.81
nesday
0.80
wrists
0.80
Activations Density 0.029%