INDEX
    Explanations

    references to racial identity and dehumanization

    New Auto-Interp
    Negative Logits
    estroy
    -0.18
    uitka
    -0.17
    .fm
    -0.16
    idor
    -0.16
    olest
    -0.15
    PropTypes
    -0.15
    StateManager
    -0.15
    onas
    -0.15
    orta
    -0.14
    ãĥ¼ãĥ¬
    -0.14
    POSITIVE LOGITS
     rights
    0.16
     cheap
    0.16
     Fir
    0.14
    ίÏīν
    0.14
    -rights
    0.14
     Tanner
    0.14
     treated
    0.14
     disposable
    0.14
    ahn
    0.14
     systematically
    0.14
    Act Density 0.147%

    No Known Activations