INDEX
Explanations
descriptions and references to violent actions and killings
New Auto-Interp
Negative Logits
602
-0.16
790
-0.16
Assault
-0.16
838
-0.15
lac
-0.14
recovery
-0.14
oped
-0.14
.Annotation
-0.13
934
-0.13
atan
-0.13
POSITIVE LOGITS
/exec
0.15
errat
0.15
æº
0.15
ligne
0.15
Ful
0.14
kill
0.14
æŃ
0.14
kills
0.14
ада
0.14
kills
0.14
Activations Density 0.179%