INDEX
Explanations
references to villains, often described in a negative context
terms related to villains in narratives
New Auto-Interp
Negative Logits
galitarian
-0.81
ollen
-0.79
sterdam
-0.76
independent
-0.76
ikk
-0.70
issance
-0.70
chedel
-0.69
choes
-0.69
apers
-0.68
bles
-0.67
POSITIVE LOGITS
ous
1.07
villain
1.02
ously
0.93
villains
0.92
Bane
0.90
mastermind
0.83
hattan
0.70
esses
0.70
scourge
0.69
CVE
0.68
Activations Density 0.014%