INDEX
    Explanations

    mentions and discussions of violence, particularly in the context of its impact on various societal issues

    New Auto-Interp
    Negative Logits
    иÑĪ
    -0.16
    size
    -0.15
    lify
    -0.15
    ublish
    -0.15
    opa
    -0.15
    akin
    -0.15
    ikat
    -0.15
    spy
    -0.15
    rid
    -0.14
    istical
    -0.14
    POSITIVE LOGITS
     directed
    0.24
     towards
    0.23
     against
    0.23
     toward
    0.23
     Against
    0.22
     committed
    0.20
     Tow
    0.20
     Towards
    0.18
    /ag
    0.18
    against
    0.17
    Act Density 0.024%

    No Known Activations