INDEX
Explanations
references to violence and casualties in political or conflict-related contexts
New Auto-Interp
Negative Logits
damaging
-0.15
thief
-0.15
linkplain
-0.15
heimer
-0.14
_pci
-0.14
urum
-0.14
dam
-0.14
ÄĻż
-0.14
亡
-0.14
æIJ
-0.14
POSITIVE LOGITS
Mass
0.45
mass
0.44
mass
0.41
massacre
0.41
slaughter
0.39
Mass
0.38
sla
0.38
_mass
0.34
butcher
0.34
massa
0.34
Activations Density 0.029%