INDEX
    Explanations

    instances of violent actions or events

    New Auto-Interp
    Negative Logits
    ycastle
    -0.17
    .dsl
    -0.15
    Ze
    -0.15
    g
    -0.15
    ault
    -0.15
    732
    -0.15
    .bundle
    -0.15
    yles
    -0.14
     */;↵
    -0.14
    á»įc
    -0.14
    POSITIVE LOGITS
    ãĤ¤ãĥĪ
    0.15
     unto
    0.15
    INED
    0.15
    нав
    0.14
    داد
    0.14
    jit
    0.14
    inky
    0.14
    @ResponseBody
    0.14
    akin
    0.13
    ضÛĮ
    0.13
    Act Density 0.015%

    No Known Activations