INDEX
Explanations
words related to violent or harsh actions or situations
references to violence and brutality
New Auto-Interp
Negative Logits
arten
-0.79
iasm
-0.79
ucket
-0.75
manuel
-0.73
kj
-0.72
glas
-0.71
Bundle
-0.70
bub
-0.70
conservancy
-0.70
leaf
-0.69
POSITIVE LOGITS
punishments
1.04
cruelty
0.96
beasts
0.92
torture
0.92
murders
0.91
punishment
0.91
retribution
0.91
brutality
0.88
Slaughter
0.87
murder
0.84
Activations Density 0.107%