INDEX
Explanations
mentions of loss or endangerment of human lives
references to the concept of lives being at risk or lost
New Auto-Interp
Negative Logits
ractive
-0.67
phabet
-0.66
CAST
-0.65
gomery
-0.64
atchewan
-0.63
NetMessage
-0.63
orney
-0.62
agger
-0.61
ority
-0.60
Mack
-0.60
POSITIVE LOGITS
lihood
0.97
chool
0.82
mares
0.80
guards
0.78
journal
0.77
Forever
0.76
behind
0.74
lives
0.72
liness
0.72
ously
0.72
Activations Density 0.014%