INDEX
    Explanations

    references to heroes and heroism in various contexts

    New Auto-Interp
    Negative Logits
    roje
    -0.17
    serter
    -0.15
    VEC
    -0.15
    ãĤ¤ãĤ¯
    -0.15
    å¢ĵ
    -0.15
    enor
    -0.15
    oy
    -0.15
    enko
    -0.14
    erman
    -0.14
    wers
    -0.14
    POSITIVE LOGITS
    ingles
    0.17
    he
    0.15
    ines
    0.15
    ing
    0.15
    oval
    0.14
    inch
    0.14
    ically
    0.14
    اÙĨÙĩ
    0.14
    ำ
    0.14
    olin
    0.13
    Act Density 0.041%

    No Known Activations