INDEX
    Explanations

    descriptions related to violent or harmful actions

    instances of the word "brutal" in contexts related to violence or suffering

    New Auto-Interp
    Negative Logits
    leaf
    -0.87
    ploma
    -0.77
    ource
    -0.77
    cript
    -0.76
    verage
    -0.75
    OPLE
    -0.74
    clips
    -0.74
    BU
    -0.71
    arten
    -0.71
    Recommend
    -0.71
    POSITIVE LOGITS
     assault
    1.03
    ized
    1.01
     assaults
    0.97
     murders
    0.95
     torture
    0.93
    izing
    0.92
     punishments
    0.91
     beasts
    0.89
     murder
    0.85
     retribution
    0.84
    Act Density 0.040%

    No Known Activations