INDEX
Explanations
instances of loss and victimization involving individuals or groups
New Auto-Interp
Negative Logits
Kill
-0.23
Kill
-0.21
kill
-0.21
kill
-0.19
killing
-0.18
Kills
-0.18
kills
-0.16
Killing
-0.15
kills
-0.14
bÃło
-0.14
POSITIVE LOGITS
died
0.27
die
0.26
succ
0.26
DIE
0.24
dies
0.23
perish
0.23
die
0.23
Die
0.21
suff
0.21
_die
0.20
Activations Density 0.100%