INDEX
    Explanations

    words related to demonization or criticism

    references to demonization and related concepts

    New Auto-Interp
    Negative Logits
    ippi
    -0.97
    RAFT
    -0.79
    IGH
    -0.74
    ļéĨĴ
    -0.69
    sers
    -0.69
    å
    -0.68
     Seah
    -0.66
    aird
    -0.65
    ILLE
    -0.65
    rehensive
    -0.65
    POSITIVE LOGITS
    stration
    0.94
    iac
    0.93
    ises
    0.87
    ised
    0.86
    izing
    0.86
    ising
    0.86
    ized
    0.82
    ization
    0.81
    oid
    0.81
    izes
    0.81
    Act Density 0.007%

    No Known Activations