INDEX
Explanations
terms related to violence and its societal implications
New Auto-Interp
Negative Logits
amework
-0.16
esty
-0.15
.datas
-0.15
izar
-0.15
evice
-0.15
rowsable
-0.15
sqlCommand
-0.15
onas
-0.14
insults
-0.14
otas
-0.14
POSITIVE LOGITS
kill
0.25
violence
0.24
killing
0.21
sad
0.21
kills
0.20
kill
0.20
justification
0.18
dispatch
0.18
Violence
0.18
killings
0.17
Activations Density 0.304%