INDEX
    Explanations

    terms associated with independent or marginalized groups and their experiences

    New Auto-Interp
    Negative Logits
    idebar
    -0.18
    tons
    -0.17
    endas
    -0.16
    ADATA
    -0.16
    ARRIER
    -0.16
    aved
    -0.16
    IGHL
    -0.15
    sworth
    -0.15
    ROTO
    -0.15
    uptools
    -0.14
    POSITIVE LOGITS
     ind
    0.22
    idual
    0.22
     Ind
    0.20
    eterminate
    0.20
    pend
    0.20
    istinguish
    0.19
    endent
    0.17
    ilog
    0.17
    uced
    0.17
    gra
    0.15
    Act Density 0.039%

    No Known Activations