INDEX
Explanations
phrases related to violent actions, particularly murder
instances of the word "murdered" and discussions surrounding violent death
New Auto-Interp
Negative Logits
issue
-0.78
rium
-0.71
lag
-0.71
CON
-0.69
worthiness
-0.68
wcsstore
-0.68
yrinth
-0.67
ffic
-0.67
regulated
-0.66
annis
-0.65
POSITIVE LOGITS
murdered
0.93
spree
0.89
throats
0.80
DAQ
0.75
murders
0.75
murdering
0.73
adoes
0.73
soever
0.72
murder
0.72
innoc
0.71
Activations Density 0.024%