INDEX
    Explanations

    words related to dehumanization and its effects

    New Auto-Interp
    Negative Logits
     Elli
    -0.16
    olding
    -0.15
     Typ
    -0.15
    deck
    -0.15
     Amph
    -0.15
    gend
    -0.15
     comp
    -0.15
    ruz
    -0.14
     Ele
    -0.14
    StreamReader
    -0.14
    POSITIVE LOGITS
    human
    0.20
    omon
    0.18
     facto
    0.18
     rig
    0.17
    .construct
    0.17
    value
    0.17
    grading
    0.16
    construct
    0.16
    icide
    0.16
    ä¼ĺ
    0.16
    Act Density 0.019%

    No Known Activations