INDEX
    Explanations

    negative descriptors related to severity and cruelty

    New Auto-Interp
    Negative Logits
    OrFail
    -0.18
    ONSE
    -0.17
    zier
    -0.15
    368
    -0.15
    ypse
    -0.15
    ový
    -0.14
     Skyl
    -0.14
    tÃŃ
    -0.14
    ossal
    -0.14
    OrCreate
    -0.14
    POSITIVE LOGITS
    vard
    0.21
    -hard
    0.14
    erner
    0.14
     Tough
    0.14
    dre
    0.14
    ned
    0.14
    -cookie
    0.14
    ening
    0.13
    .selector
    0.13
    sock
    0.13
    Act Density 0.060%

    No Known Activations