INDEX
Explanations
words related to heroic figures
mentions of the word "hero" in various contexts
New Auto-Interp
Negative Logits
aeda
-0.91
imentary
-0.80
independent
-0.77
olen
-0.75
ateur
-0.73
pora
-0.69
ongyang
-0.69
emporary
-0.68
ntil
-0.67
ighton
-0.67
POSITIVE LOGITS
hero
1.27
heroine
1.12
heroes
1.06
Hero
0.98
hero
0.97
protagonist
0.93
Hero
0.82
士
0.81
ically
0.79
rities
0.76
Activations Density 0.008%