INDEX
    Explanations

    mentions of intense suffering, pain, and related concepts

    references to suffering and its various contexts

    New Auto-Interp
    Negative Logits
    ouncing
    -0.68
    sure
    -0.67
     Collider
    -0.67
    clude
    -0.66
    sports
    -0.64
    cluding
    -0.64
    leans
    -0.63
    reek
    -0.62
     lev
    -0.62
    wed
    -0.62
    POSITIVE LOGITS
     inflicted
    0.97
    hani
    0.83
     Nadu
    0.82
     miser
    0.78
     horribly
    0.77
     fools
    0.76
    lessly
    0.76
     havoc
    0.75
     agony
    0.75
     endured
    0.74
    Act Density 0.034%

    No Known Activations