INDEX
    Explanations

    phrases related to demonizing or criticizing individuals or groups

    terms associated with demonization and stigma

    New Auto-Interp
    Negative Logits
    ippi
    -0.85
    RAFT
    -0.78
    IGH
    -0.74
    ļéĨĴ
    -0.71
    jri
    -0.69
     Soda
    -0.68
     Seym
    -0.67
    orship
    -0.67
     Seah
    -0.65
     proble
    -0.65
    POSITIVE LOGITS
    stration
    1.05
    iac
    1.01
    ises
    0.96
    ising
    0.96
    ised
    0.93
    izing
    0.92
    oid
    0.91
    izes
    0.90
    oids
    0.87
    ormal
    0.85
    Act Density 0.008%

    No Known Activations