INDEX
Explanations
descriptions of physical harm or threat of harm
phrases related to threats and violence against individuals
New Auto-Interp
Negative Logits
atari
-0.82
portfolios
-0.78
Horizons
-0.77
ellen
-0.76
Transcript
-0.75
revamped
-0.75
soDeliveryDate
-0.73
Bermuda
-0.72
archives
-0.72
Kepler
-0.70
POSITIVE LOGITS
violence
1.47
violence
1.41
harm
1.32
provocation
1.28
injure
1.24
aggression
1.22
robbery
1.21
aggress
1.21
violent
1.21
danger
1.21
Activations Density 0.601%