INDEX
Explanations
mentions of blood
references to violence and its implications
New Auto-Interp
Negative Logits
Amend
-0.86
awaru
-0.75
IX
-0.71
Spac
-0.71
Lank
-0.70
VIDEOS
-0.69
ECH
-0.67
VICE
-0.64
acle
-0.62
cffffcc
-0.62
POSITIVE LOGITS
thirst
1.52
bath
1.31
hound
1.28
stained
1.27
lust
1.20
shed
1.14
lines
1.12
thirsty
0.97
shot
0.97
spl
0.96
Activations Density 0.044%