INDEX
    Explanations

    references to marginalized groups and discussions regarding systemic inequality

    New Auto-Interp
    Negative Logits
     people
    -0.64
     someone
    -0.63
    someone
    -0.62
     individuals
    -0.62
     itself
    -0.60
    somebody
    -0.60
     Itself
    -0.59
     somebody
    -0.58
     mensen
    -0.58
    itself
    -0.58
    POSITIVE LOGITS
     whom
    1.03
    whom
    0.82
     whose
    0.62
    whose
    0.58
    Whom
    0.57
     who
    0.57
     Whom
    0.53
     wheelchairs
    0.52
     الذين
    0.51
    who
    0.50
    Act Density 0.635%

    No Known Activations