INDEX
    Explanations

    mentions of suffering and related concepts, such as pain, exploitation, and freedom

    references to suffering and its impact on individuals and society

    New Auto-Interp
    Negative Logits
    sure
    -0.67
    sports
    -0.67
     Collider
    -0.66
     latch
    -0.65
    cluding
    -0.63
    clude
    -0.62
    ioch
    -0.62
     lev
    -0.62
    leans
    -0.61
    ouncing
    -0.60
    POSITIVE LOGITS
     inflicted
    0.90
    lehem
    0.83
     Nadu
    0.77
     endured
    0.77
    hani
    0.76
     setbacks
    0.74
    lessly
    0.74
    nesses
    0.73
     suffered
    0.73
     suffering
    0.72
    Act Density 0.025%

    No Known Activations