INDEX
    Explanations

    concepts related to discrimination and bias based on identity characteristics

    New Auto-Interp
    Negative Logits
    linkplain
    -0.16
    ÙĪÙĦات
    -0.16
    grily
    -0.16
    oles
    -0.15
    oise
    -0.15
    ModelIndex
    -0.15
    QualifiedName
    -0.15
    á»ĥn
    -0.14
    eselect
    -0.14
     Ñĥмов
    -0.14
    POSITIVE LOGITS
     race
    0.33
    race
    0.26
     gender
    0.25
     Race
    0.24
    Race
    0.23
     nationality
    0.23
     sex
    0.22
     status
    0.22
     religion
    0.22
    age
    0.21
    Act Density 0.086%

    No Known Activations