INDEX
    Explanations

    identifiers related to diversity and discrimination such as race, ethnicity, nationality, and religion

    references to social categories and identity markers such as race, ethnicity, gender, and disability

    New Auto-Interp
    Negative Logits
    hiba
    -0.87
    ocket
    -0.80
    pload
    -0.70
    alore
    -0.69
    noon
    -0.67
     Ack
    -0.66
    Adds
    -0.64
    put
    -0.63
    akings
    -0.63
    kens
    -0.62
    POSITIVE LOGITS
     ethnicity
    1.35
     nationality
    1.13
     gender
    1.09
    ethnic
    1.06
     ethnic
    1.05
     genders
    1.03
    Gender
    1.01
    gender
    1.00
     Ethnic
    0.98
     Gender
    0.97
    Act Density 0.202%

    No Known Activations