INDEX
Explanations
references to villains in various contexts
references to villains and their destructive traits or actions
New Auto-Interp
Negative Logits
aeda
-0.65
HELP
-0.63
authenticated
-0.61
OFFIC
-0.60
coerc
-0.59
ERSON
-0.59
livest
-0.58
cooperative
-0.55
ITNESS
-0.55
earners
-0.54
POSITIVE LOGITS
ous
2.84
ously
2.58
OUS
1.77
osity
1.43
uously
1.35
ized
1.32
izing
1.31
ising
1.29
istic
1.27
iously
1.25
Activations Density 0.096%