INDEX
Explanations
extremely negative descriptors related to violent actions or behaviors
New Auto-Interp
Negative Logits
soñ
-0.64
hadas
-0.61
houſe
-0.61
RTLR
-0.60
refugi
-0.60
administrativos
-0.58
normaux
-0.58
Chwiliwch
-0.58
unanje
-0.58
flattered
-0.57
POSITIVE LOGITS
cruel
1.15
vicious
1.10
vile
1.09
depra
1.06
diabo
1.05
savage
1.02
evil
1.02
ruthless
1.01
brutal
1.00
sinister
0.96
Activations Density 0.573%