INDEX
    Explanations

    instances of violence or attacks involving various groups and individuals

    New Auto-Interp
    Negative Logits
    ÑģÑĤвÑĥ
    -0.15
     visual
    -0.15
    uga
    -0.15
    ÏĥÏĦÏĮ
    -0.14
    ailable
    -0.14
    ech
    -0.14
    inally
    -0.14
    assadors
    -0.14
    /layouts
    -0.14
    anto
    -0.13
    POSITIVE LOGITS
    çĻº
    0.16
    ever
    0.15
    led
    0.14
    igh
    0.14
    á»ĵ
    0.14
    _STANDARD
    0.14
     Synd
    0.13
    Ø®Ùģ
    0.13
    .ua
    0.13
    heim
    0.13
    Act Density 0.239%

    No Known Activations