INDEX
Explanations
words related to villains, especially when they are described as dangerous or menacing
references to villains in narratives
New Auto-Interp
Negative Logits
ollen
-0.82
galitarian
-0.81
independent
-0.79
issance
-0.74
sterdam
-0.73
press
-0.71
choes
-0.71
obar
-0.71
apers
-0.69
rique
-0.68
POSITIVE LOGITS
villain
1.14
ous
1.07
villains
1.02
ously
0.90
Bane
0.86
mastermind
0.85
antagonist
0.76
esses
0.73
satir
0.71
CVE
0.68
Activations Density 0.013%