INDEX
Explanations
actions associated with violence and attacks against specific groups of people
New Auto-Interp
Negative Logits
difficulté
-0.59
difficultés
-0.50
KURZBESCHREIBUNG
-0.49
رایی
-0.49
itates
-0.47
textAppearance
-0.47
circonst
-0.47
concernés
-0.46
modalités
-0.46
compliqué
-0.45
POSITIVE LOGITS
innocent
1.54
innocent
1.33
unsuspecting
1.16
innoc
1.15
inocente
1.09
defen
1.07
innoc
1.05
Innocent
1.00
Innoc
0.98
innocence
0.94
Activations Density 0.624%