INDEX
Explanations
words related to heroism or being described as a hero
references to the concept of a "hero."
New Auto-Interp
Negative Logits
aeda
-1.00
ateur
-0.81
imentary
-0.75
independent
-0.70
ntil
-0.68
creen
-0.67
olen
-0.66
ktop
-0.65
ossier
-0.64
millenn
-0.64
POSITIVE LOGITS
hero
1.06
heroine
0.93
Hero
0.93
hero
0.92
heroes
0.86
ically
0.85
ku
0.85
ines
0.77
rities
0.75
士
0.72
Activations Density 0.010%