INDEX
    Explanations

    instances of violence and death

    New Auto-Interp
    Negative Logits
    ampie
    -0.17
    cka
    -0.17
    egot
    -0.16
    upt
    -0.15
    inki
    -0.15
    uffman
    -0.15
    plib
    -0.15
    egend
    -0.15
    abant
    -0.15
    erval
    -0.14
    POSITIVE LOGITS
     unconscious
    0.31
     conv
    0.28
     gas
    0.28
     conscious
    0.26
    gas
    0.25
    conscious
    0.24
     consciousness
    0.24
     motion
    0.24
     struggling
    0.24
     woo
    0.23
    Act Density 0.364%

    No Known Activations