INDEX
Explanations
phrases related to violent acts and their victims
New Auto-Interp
Negative Logits
idth
-0.74
Applic
-0.68
FML
-0.68
RET
-0.65
cancell
-0.65
osterone
-0.61
Balt
-0.61
ãĥ¼ãĥĨ
-0.60
WARN
-0.60
Īè
-0.59
POSITIVE LOGITS
unarmed
1.04
innoc
1.00
innocent
0.98
indiscrim
0.91
senseless
0.91
bystanders
0.87
fleeing
0.86
peacefully
0.86
hostage
0.83
hostages
0.81
Activations Density 0.175%