INDEX
Explanations
references to violence in various contexts
New Auto-Interp
Negative Logits
fjspx
-0.67
ContentAlignment
-0.66
مرئيه
-0.59
брь
-0.56
pocz
-0.54
NOPQRST
-0.52
Heu
-0.52
wą
-0.52
Revival
-0.52
tre
-0.51
POSITIVE LOGITS
violence
2.45
Violence
2.22
violence
2.18
Violence
2.13
violent
2.07
violent
1.94
Violent
1.90
Violent
1.83
violencia
1.74
violen
1.71
Activations Density 0.127%