INDEX
    Explanations

    words that signify suffering or negative impacts of actions

    New Auto-Interp
    Negative Logits
    ipple
    -0.15
    Configurer
    -0.15
    rops
    -0.15
    ones
    -0.14
    osa
    -0.14
    af
    -0.14
    adla
    -0.14
     Wich
    -0.14
    iers
    -0.13
    lesh
    -0.13
    POSITIVE LOGITS
     nature
    0.21
    nature
    0.16
     involved
    0.15
     Nature
    0.15
     aspect
    0.15
     they
    0.15
    pirit
    0.15
    à¹Įà¸Ł
    0.14
     implicit
    0.14
    865
    0.14
    Act Density 0.263%

    No Known Activations