INDEX
    Explanations

    words related to work, pay, and possibly gender discrimination

    New Auto-Interp
    Negative Logits
    enance
    -0.74
    luence
    -0.68
    ablished
    -0.63
    equality
    -0.62
    resents
    -0.61
    CONCLUS
    -0.60
    ablish
    -0.60
    Failure
    -0.59
    edience
    -0.58
    ifference
    -0.58
    POSITIVE LOGITS
    !).
    1.52
    ?).
    1.47
    !),
    1.41
    !)
    1.34
    ?),
    1.33
    ).
    1.25
    *)
    1.24
    >)
    1.23
    ?)
    1.21
    )).
    1.19
    Act Density 0.615%

    No Known Activations