INDEX
    Explanations

    phrases or words indicating a contrast or contradiction

    phrases or statements that challenge popular beliefs or expectations

    New Auto-Interp
    Negative Logits
    oided
    -0.86
    ross
    -0.71
    estones
    -0.70
    azz
    -0.70
    BSD
    -0.67
    lov
    -0.66
    among
    -0.65
    CLA
    -0.65
    mail
    -0.65
    ROM
    -0.63
    POSITIVE LOGITS
     prevailing
    0.78
     stereotypical
    0.78
     stereotypes
    0.78
     stereotype
    0.77
    ptions
    0.71
     expectations
    0.71
     conventional
    0.71
     belie
    0.70
     belief
    0.69
     usual
    0.68
    Act Density 0.129%

    No Known Activations