INDEX
Explanations
references to conflict and violence, particularly related to terrorism and illegal settlements
New Auto-Interp
Negative Logits
Mercy
-0.16
adm
-0.15
Anchor
-0.15
aram
-0.15
ovnÃŃ
-0.15
oples
-0.14
kostenlose
-0.14
Kemal
-0.13
legg
-0.13
Dog
-0.13
POSITIVE LOGITS
endra
0.20
ż
0.15
estroy
0.14
Lub
0.14
Chore
0.14
/ng
0.13
ë°ĶìĿ´
0.13
adena
0.13
riot
0.13
寸
0.13
Activations Density 0.006%