INDEX
    Explanations

    mentions of negative incidents in society, such as harassment, violence, and tragedies

    references to violence and conflict

    New Auto-Interp
    Negative Logits
    Firstly
    -0.84
    \)
    -0.75
     Firstly
    -0.73
    Appearance
    -0.70
    Very
    -0.69
    Primary
    -0.69
    ,''
    -0.69
    Material
    -0.67
    .}
    -0.67
    operation
    -0.67
    POSITIVE LOGITS
     tsun
    0.78
     cannibal
    0.76
     poisoned
    0.75
     coughing
    0.73
     inexpl
    0.72
     Kardashian
    0.71
     disgr
    0.69
     botched
    0.69
     grizz
    0.68
     assass
    0.68
    Act Density 1.039%

    No Known Activations