INDEX
Explanations
words associated with violence and brutality
New Auto-Interp
Negative Logits
ICLE
-0.81
manuel
-0.72
Leilan
-0.70
mberg
-0.69
conservancy
-0.69
bles
-0.68
iasm
-0.68
OPLE
-0.68
ource
-0.67
arters
-0.66
POSITIVE LOGITS
punishments
0.95
retribution
0.93
beasts
0.90
thug
0.90
criminals
0.88
punishment
0.88
honesty
0.87
predators
0.85
murderers
0.85
thugs
0.84
Activations Density 0.136%