INDEX
Explanations
words related to heroes or heroic acts
references to heroes or heroic figures
New Auto-Interp
Negative Logits
aeda
-0.93
ntil
-0.78
ateur
-0.75
imentary
-0.73
creen
-0.70
ighton
-0.69
acular
-0.68
uce
-0.66
gow
-0.66
cerning
-0.66
POSITIVE LOGITS
ically
1.02
ku
0.90
hero
0.90
ines
0.88
heroine
0.88
hero
0.85
heroes
0.84
Hero
0.80
ics
0.80
士
0.76
Activations Density 0.027%