INDEX
    Explanations

    words related to heroic figures

    mentions of the word "hero" in various contexts

    New Auto-Interp
    Negative Logits
    aeda
    -0.91
    imentary
    -0.80
    independent
    -0.77
    olen
    -0.75
    ateur
    -0.73
    pora
    -0.69
    ongyang
    -0.69
    emporary
    -0.68
    ntil
    -0.67
    ighton
    -0.67
    POSITIVE LOGITS
     hero
    1.27
     heroine
    1.12
     heroes
    1.06
    Hero
    0.98
    hero
    0.97
     protagonist
    0.93
     Hero
    0.82
    士
    0.81
    ically
    0.79
    rities
    0.76
    Act Density 0.008%

    No Known Activations