INDEX
    Explanations

    expressions of moral responsibility and social justice actions

    New Auto-Interp
    Negative Logits
    loh
    -0.18
    zsche
    -0.15
    uluk
    -0.15
     خارجÙĬØ©
    -0.15
    ilen
    -0.15
    ieri
    -0.15
    deaux
    -0.15
    uiten
    -0.15
    ULA
    -0.14
    ilik
    -0.14
    POSITIVE LOGITS
     us
    0.36
     ourselves
    0.30
     we
    0.27
    æĪij们
    0.25
     society
    0.23
     ours
    0.23
     our
    0.21
    æĪijåĢij
    0.21
     everyone
    0.21
     nós
    0.21
    Act Density 0.253%

    No Known Activations