INDEX
    Explanations

    phrases referring to demographic attributes such as race, ethnicity, and gender

    phrases emphasizing racial and gender identities

    New Auto-Interp
    Negative Logits
    Downloadha
    -0.80
     externalToEVAOnly
    -0.72
    iquid
    -0.71
     needles
    -0.67
     VIDEOS
    -0.67
    hyde
    -0.66
     livest
    -0.66
    netflix
    -0.65
    plementation
    -0.65
    downs
    -0.64
    POSITIVE LOGITS
     course
    0.98
     whom
    0.96
     Colour
    0.95
     stature
    0.93
     colour
    0.90
     varying
    0.89
    course
    0.87
     color
    0.83
     renown
    0.82
    sted
    0.81
    Act Density 0.096%

    No Known Activations