INDEX
    Explanations

    terms related to different categories such as race, class, gender, and other characteristics within a social context

    references to social categorizations and roles

    New Auto-Interp
    Negative Logits
    urtles
    -0.76
    DonaldTrump
    -0.75
    Newsletter
    -0.67
    bledon
    -0.63
     ÂŃ
    -0.61
     Remem
    -0.59
    Salt
    -0.59
    isSpecialOrderable
    -0.56
    Send
    -0.55
    displayText
    -0.55
    POSITIVE LOGITS
    /,
    1.73
    /
    1.68
    /)
    1.55
    /"
    1.53
    /?
    1.49
    /_
    1.46
    /(
    1.43
    /#
    1.42
    /.
    1.40
     combo
    1.32
    Act Density 0.130%

    No Known Activations