INDEX
    Explanations

    words associated with stereotypes

    references to stereotypes and their implications or effects

    New Auto-Interp
    Negative Logits
    sterdam
    -0.79
    ayan
    -0.75
    ateur
    -0.74
    ertodd
    -0.72
    rique
    -0.67
    inth
    -0.66
    ighters
    -0.66
    nesty
    -0.65
    sis
    -0.65
    ighth
    -0.65
    POSITIVE LOGITS
     stereotyp
    1.01
     stereotypes
    0.91
     stereotype
    0.90
    è¦ļéĨĴ
    0.78
     clich
    0.77
    rities
    0.76
     depictions
    0.75
     portrayal
    0.73
     tropes
    0.72
     Breaker
    0.71
    Act Density 0.019%

    No Known Activations