INDEX
    Explanations

    mentions of discrimination based on various factors, such as sexual orientation, race, and disability

    references to discrimination, particularly in legal or social contexts

    New Auto-Interp
    Negative Logits
    ski
    -0.72
    Adds
    -0.71
    cycle
    -0.71
    tom
    -0.70
    TOR
    -0.68
    DCS
    -0.68
    bold
    -0.67
    hran
    -0.66
    sis
    -0.66
    links
    -0.65
    POSITIVE LOGITS
     discrimination
    0.97
    rimination
    0.92
     prejudice
    0.85
     discriminated
    0.84
     Discrimination
    0.82
     protections
    0.79
     retaliation
    0.79
     slurs
    0.78
     prejud
    0.78
     discriminating
    0.77
    Act Density 0.034%

    No Known Activations