INDEX
    Explanations

    adverbs or adjectives indicating success or proficiency

    significant and impactful words indicating success or authority

    New Auto-Interp
    Negative Logits
    ieri
    -0.84
    ivas
    -0.73
    armac
    -0.70
    fman
    -0.69
    apult
    -0.66
    irez
    -0.64
    veyard
    -0.64
     Kul
    -0.63
    strom
    -0.63
    к
    -0.62
    POSITIVE LOGITS
     moderator
    0.62
     ire
    0.58
     refers
    0.57
     resumes
    0.57
     ado
    0.56
     quotes
    0.56
    pedia
    0.56
     quake
    0.55
     infer
    0.55
     blindness
    0.55
    Act Density 0.832%

    No Known Activations