INDEX
    Explanations

    references to violence and calls for justice

    New Auto-Interp
    Negative Logits
    éĭ
    -0.20
    λÏħ
    -0.16
     Sez
    -0.16
    ATURE
    -0.15
    oyer
    -0.15
    æľĭ
    -0.15
    änge
    -0.15
    ellig
    -0.15
    ccione
    -0.15
    ç½²
    -0.14
    POSITIVE LOGITS
     justice
    0.21
     identification
    0.17
     cul
    0.17
    anship
    0.16
    hir
    0.16
     identified
    0.16
    justice
    0.15
     culprit
    0.15
    æŃ
    0.15
     вин
    0.15
    Act Density 0.131%

    No Known Activations