INDEX
Explanations
mentions of physical bodily fluids or acts of violence
words and phrases related to blood and violence
New Auto-Interp
Negative Logits
awaru
-0.92
Amend
-0.80
IX
-0.77
merce
-0.75
Lank
-0.74
ECH
-0.72
OPLE
-0.72
VICE
-0.70
srfAttach
-0.69
AMA
-0.68
POSITIVE LOGITS
thirst
1.46
bath
1.38
stained
1.22
hound
1.18
shed
1.16
lust
1.14
thirsty
0.99
vessels
0.94
lines
0.93
wine
0.91
Activations Density 0.026%