INDEX
    Explanations

    references to discrimination based on various criteria such as race, gender, sexual orientation, and physical characteristics

    concepts related to discrimination and bias based on various personal characteristics

    New Auto-Interp
    Negative Logits
    Reviewer
    -0.76
     Purg
    -0.66
    metal
    -0.65
    jet
    -0.64
    bage
    -0.63
     Sunder
    -0.62
    ////////////////////////////////
    -0.61
    bj
    -0.61
    uckland
    -0.61
    invoke
    -0.61
    POSITIVE LOGITS
     ethnicity
    1.12
     nationality
    1.11
     gender
    1.02
     geography
    0.92
     colour
    0.91
     severity
    0.88
     likeness
    0.87
     resemblance
    0.85
     proximity
    0.85
     color
    0.85
    Act Density 0.296%

    No Known Activations