INDEX
    Explanations

    mentions of violent or cruel acts

    references to violence or severe harm

    New Auto-Interp
    Negative Logits
    cript
    -0.78
    annis
    -0.75
    ploma
    -0.73
    kj
    -0.73
    verage
    -0.72
    leaf
    -0.72
    OPLE
    -0.71
    FU
    -0.71
     Libraries
    -0.70
    BU
    -0.68
    POSITIVE LOGITS
     earthqu
    0.91
    ized
    0.90
     assault
    0.85
     beasts
    0.85
     killers
    0.82
     assaults
    0.80
     punishments
    0.80
     dictator
    0.79
     murdering
    0.78
     murders
    0.78
    Act Density 0.017%

    No Known Activations