INDEX
    Explanations

    phrases related to social perceptions and stereotypes

    New Auto-Interp
    Negative Logits
    vine
    -0.79
    alde
    -0.79
    ulo
    -0.75
    orst
    -0.71
    imentary
    -0.70
    hement
    -0.70
     depended
    -0.68
    bis
    -0.68
    fam
    -0.67
    interrupted
    -0.67
    POSITIVE LOGITS
     masculinity
    0.85
     criminality
    0.77
     how
    0.74
     reality
    0.73
     sexuality
    0.71
     life
    0.71
     events
    0.71
     perfection
    0.69
     homosexuality
    0.69
     morality
    0.68
    Act Density 0.082%

    No Known Activations