INDEX
Explanations
references to superhero characters and related terms
references to specific characters and their attributes in superhero narratives
New Auto-Interp
Negative Logits
abet
-0.81
cript
-0.76
hip
-0.75
say
-0.73
pai
-0.73
peat
-0.72
acles
-0.69
etics
-0.68
itism
-0.66
opathy
-0.66
POSITIVE LOGITS
Ducks
0.76
aneers
0.74
Hulk
0.72
lda
0.71
Avenger
0.69
Drac
0.68
Breaker
0.68
oster
0.68
ernaut
0.66
inous
0.66
Activations Density 0.092%