INDEX
Explanations
references to geopolitical conflicts and military actions
New Auto-Interp
Negative Logits
THROW
-0.16
Pik
-0.16
zik
-0.16
Abe
-0.16
Xi
-0.15
andle
-0.15
çİĦ
-0.15
atal
-0.15
kir
-0.15
912
-0.14
POSITIVE LOGITS
Iraqi
0.35
Saddam
0.35
Iraq
0.34
Sadd
0.31
Baghdad
0.31
Coalition
0.31
Iraq
0.30
coalition
0.29
اÙĦعراÙĤ
0.27
عراÙĤ
0.25
Activations Density 0.025%