INDEX
    Explanations

    words that denote leadership or refer to top-ranking entities or positions

    New Auto-Interp
    Negative Logits
    se
    -0.16
    ych
    -0.15
    ment
    -0.15
    sphere
    -0.14
     a
    -0.14
    umble
    -0.14
    Bean
    -0.14
    maz
    -0.14
     environment
    -0.14
     pie
    -0.14
    POSITIVE LOGITS
    -edge
    0.21
    Escort
    0.18
    ãĥ³ãĥĨ
    0.16
    Ľ°
    0.15
    ấp
    0.15
    -flight
    0.15
    strument
    0.14
     Escort
    0.14
    ierge
    0.14
    irut
    0.14
    Act Density 0.010%

    No Known Activations