INDEX
    Explanations

    actions associated with violence and attacks against specific groups of people

    New Auto-Interp
    Negative Logits
     difficulté
    -0.59
     difficultés
    -0.50
    KURZBESCHREIBUNG
    -0.49
    رایی
    -0.49
    itates
    -0.47
    textAppearance
    -0.47
     circonst
    -0.47
     concernés
    -0.46
     modalités
    -0.46
     compliqué
    -0.45
    POSITIVE LOGITS
     innocent
    1.54
    innocent
    1.33
     unsuspecting
    1.16
     innoc
    1.15
     inocente
    1.09
     defen
    1.07
    innoc
    1.05
     Innocent
    1.00
    Innoc
    0.98
     innocence
    0.94
    Act Density 0.624%

    No Known Activations