INDEX
    Explanations

    words related to bias, discrimination, and negative preconceived notions about certain groups of people

    terms related to prejudice and discrimination

    New Auto-Interp
    Negative Logits
    Interstitial
    -0.85
    VID
    -0.81
    sis
    -0.76
    adra
    -0.75
    avez
    -0.75
    ascus
    -0.74
    incinn
    -0.73
    ODE
    -0.72
    irgin
    -0.70
    ramid
    -0.70
    POSITIVE LOGITS
     prejudice
    1.28
     prejud
    1.20
     prejudices
    0.95
    eering
    0.83
    icial
    0.78
     hatred
    0.78
     intolerance
    0.74
    ophobic
    0.74
    wart
    0.73
     towards
    0.70
    Act Density 0.011%

    No Known Activations