INDEX
Explanations
references to iconic villain characters in animated films
New Auto-Interp
Negative Logits
esty
-0.16
riz
-0.15
éĨ´
-0.15
irres
-0.15
çij
-0.15
embarrass
-0.14
Sac
-0.14
gii
-0.14
mirac
-0.14
isay
-0.14
POSITIVE LOGITS
evil
0.49
Evil
0.41
villain
0.39
evil
0.34
Vill
0.31
villains
0.31
vill
0.30
sinister
0.29
malignant
0.28
male
0.27
Activations Density 0.086%