INDEX
Explanations
references to antagonists or "villains" in various contexts
references to villains in narratives or film
New Auto-Interp
Negative Logits
sterdam
-0.80
galitarian
-0.73
independent
-0.73
chedel
-0.71
independence
-0.68
DCS
-0.67
press
-0.67
porting
-0.67
glas
-0.66
oday
-0.65
POSITIVE LOGITS
ous
1.32
ously
1.10
villain
0.98
Bane
0.97
villains
0.90
ess
0.86
esses
0.86
mastermind
0.86
Dracula
0.81
OUS
0.79
Activations Density 0.038%