INDEX
Explanations
references to violence and aggressive behavior
New Auto-Interp
Negative Logits
NOPQRST
-0.79
følgelig
-0.70
profondeur
-0.70
ascin
-0.69
مرئيه
-0.69
виправивши
-0.67
fermés
-0.65
Dek
-0.65
vernote
-0.64
ggable
-0.64
POSITIVE LOGITS
violence
2.17
Violence
2.01
violence
1.83
Violence
1.82
violent
1.74
Violent
1.62
violen
1.62
violent
1.61
Violent
1.60
violencia
1.37
Activations Density 0.062%