INDEX
Explanations
words related to violent actions and resulting harm or danger
language that indicates harm, danger, or serious physical injury
New Auto-Interp
Negative Logits
Compass
-0.78
Sprite
-0.70
cylinders
-0.68
audi
-0.67
Leaders
-0.66
quotas
-0.66
\":
-0.66
soDeliveryDate
-0.64
clus
-0.63
ellen
-0.63
POSITIVE LOGITS
harm
1.65
injury
1.58
bodily
1.38
injuries
1.35
anguish
1.33
griev
1.31
harms
1.30
inconvenience
1.29
damage
1.28
damage
1.26
Activations Density 0.337%