INDEX
Explanations
references to superheroes or heroic figures
mentions of "Hero" and related terms
New Auto-Interp
Negative Logits
rup
-0.84
aeda
-0.78
Debor
-0.72
gdala
-0.71
creen
-0.70
nce
-0.69
mosp
-0.69
cone
-0.67
rea
-0.67
ORK
-0.66
POSITIVE LOGITS
ic
0.79
osate
0.76
inic
0.74
Hero
0.73
Girls
0.70
ipeg
0.69
Acad
0.68
inous
0.68
inical
0.68
vernment
0.67
Activations Density 0.030%