INDEX
    Explanations

    mentions of discrimination or inequality against minorities

    references to marginalized and minority groups

    New Auto-Interp
    Negative Logits
    ENA
    -0.82
    FIN
    -0.82
    CHA
    -0.76
    amina
    -0.76
    rol
    -0.74
    ×ŀ
    -0.73
    ר
    -0.72
    PT
    -0.72
    rolog
    -0.71
    ×
    -0.71
    POSITIVE LOGITS
     minorities
    1.08
     genders
    0.99
    rats
    0.89
     minority
    0.83
     backgrounds
    0.80
    eatures
    0.79
    ecided
    0.77
     whites
    0.77
     unemploy
    0.75
     males
    0.75
    Act Density 0.005%

    No Known Activations