INDEX
Explanations
instances of words related to harm, injury, or destruction
references to casualties and related terminology in the context of war and conflict
New Auto-Interp
Negative Logits
perm
-0.83
ramid
-0.74
itudinal
-0.69
por
-0.67
odes
-0.67
erson
-0.64
enlightened
-0.63
appa
-0.62
Collider
-0.62
angled
-0.62
POSITIVE LOGITS
casualties
1.17
bystanders
0.95
casualty
0.93
inflicted
0.87
incurred
0.83
suffered
0.79
Victims
0.76
victims
0.74
civilians
0.73
losses
0.73
Activations Density 0.022%