INDEX
Explanations
references to antagonists in fictional stories
references to villains in narratives
New Auto-Interp
Negative Logits
RT
-0.79
kt
-0.77
Input
-0.76
FM
-0.73
FM
-0.71
CLS
-0.71
IGHTS
-0.71
parts
-0.70
Mat
-0.69
ounds
-0.68
POSITIVE LOGITS
villain
3.33
villains
2.63
antagonist
1.59
traitor
1.52
vill
1.30
coward
1.26
heroine
1.19
trope
1.16
treacher
1.15
philanthrop
1.11
Activations Density 0.029%