INDEX
    Explanations

    mentions of violence and its various contexts or impacts

    New Auto-Interp
    Negative Logits
    opa
    -0.17
    lify
    -0.16
    oga
    -0.16
    akin
    -0.15
    ublish
    -0.15
    иÑĪ
    -0.15
    _printf
    -0.14
    овÑĭй
    -0.14
    cheid
    -0.14
    ampoo
    -0.14
    POSITIVE LOGITS
     directed
    0.22
     toward
    0.21
     towards
    0.21
     Against
    0.21
     against
    0.21
     Tow
    0.20
    /ag
    0.18
     Towards
    0.18
    ive
    0.17
    Against
    0.15
    Act Density 0.031%

    No Known Activations