INDEX
    Explanations

    negative actions or attributes associated with dehumanizing, demonizing, stigmatizing, or vilifying individuals or groups

    words related to social stigmatization and dehumanization

    New Auto-Interp
    Negative Logits
    UTERS
    -0.69
    oret
    -0.62
    negie
    -0.60
     enthusi
    -0.59
    uid
    -0.59
    INC
    -0.58
    cffff
    -0.57
     Prospect
    -0.57
    stead
    -0.57
    NH
    -0.56
    POSITIVE LOGITS
     slurs
    0.92
    imaru
    0.90
     stereotypes
    0.88
     stigma
    0.80
     prejudice
    0.77
     vil
    0.77
     insults
    0.76
     bullies
    0.76
     stigmat
    0.75
     dehuman
    0.74
    Act Density 0.079%

    No Known Activations