INDEX
    Explanations

    violent actions and physical harm

    references to acts of violence or severe harm

    New Auto-Interp
    Negative Logits
    entric
    -0.71
    uci
    -0.69
     Collabor
    -0.65
    issions
    -0.63
    ty
    -0.61
    TP
    -0.60
     extension
    -0.60
    impl
    -0.60
     Mutual
    -0.59
    AE
    -0.59
    POSITIVE LOGITS
     beaten
    3.79
     beat
    1.93
     beating
    1.85
    beat
    1.59
     battered
    1.55
     slain
    1.44
     defeated
    1.42
    Beat
    1.39
     beats
    1.34
     bruised
    1.33
    Act Density 0.018%

    No Known Activations