INDEX
    Explanations

    words and phrases related to moral evaluations and social justice concepts

    New Auto-Interp
    Negative Logits
    اÙħØ©
    -0.16
    epad
    -0.16
    ongan
    -0.16
    lse
    -0.15
    пов
    -0.14
    Äįan
    -0.14
    etter
    -0.14
    inke
    -0.14
    ched
    -0.14
    elle
    -0.13
    POSITIVE LOGITS
    igua
    0.17
    /documentation
    0.15
    awns
    0.15
    sen
    0.15
    eor
    0.15
    .Bounds
    0.14
     rem
    0.14
    itos
    0.14
    unya
    0.14
     phil
    0.14
    Act Density 0.022%

    No Known Activations