INDEX
    Explanations

    expressions of moral and ethical condemnation related to conflict and suffering

    New Auto-Interp
    Negative Logits
    ãĥģãĥ£
    -0.15
     phái
    -0.15
    cus
    -0.15
    ermen
    -0.14
     Roose
    -0.14
    pak
    -0.14
    Cli
    -0.14
     handshake
    -0.14
     foo
    -0.13
    ORK
    -0.13
    POSITIVE LOGITS
     dispos
    0.18
     sett
    0.16
    gaard
    0.16
     apartheid
    0.16
    annes
    0.15
     Gros
    0.15
     Pall
    0.15
    ey
    0.14
    jspx
    0.14
    ĭ
    0.14
    Act Density 0.024%

    No Known Activations