INDEX
    Explanations

    expressions related to inclusivity and universality

    New Auto-Interp
    Negative Logits
    ega
    -0.14
    een
    -0.14
    eyn
    -0.14
    uraa
    -0.14
     McMahon
    -0.14
    aeda
    -0.13
    overy
    -0.13
    rega
    -0.13
    enerima
    -0.13
    ura
    -0.13
    POSITIVE LOGITS
     sake
    0.24
     purposes
    0.23
    andler
    0.17
    opensource
    0.15
    oug
    0.15
    vä
    0.15
     reasons
    0.15
    vell
    0.15
    pur
    0.15
     instance
    0.15
    Act Density 0.060%

    No Known Activations