INDEX
Explanations
descriptions or mentions of injuries
mentions of injuries or harm caused to individuals
New Auto-Interp
Negative Logits
gency
-0.79
gran
-0.71
perm
-0.70
minist
-0.69
ingen
-0.65
ellipt
-0.63
algorithm
-0.62
arten
-0.62
SpaceEngineers
-0.61
ramid
-0.61
POSITIVE LOGITS
jured
0.88
survivors
0.85
Survivors
0.81
injured
0.80
victims
0.79
bystanders
0.78
adoes
0.76
wounded
0.76
injuring
0.75
../
0.75
Activations Density 0.022%