INDEX
Explanations
references to specific individuals and their roles in political or social events
New Auto-Interp
Negative Logits
UNC
-0.17
.yy
-0.16
insky
-0.16
ãĥ³ãĤ¬
-0.16
amu
-0.15
Islam
-0.15
ATCH
-0.15
hammad
-0.15
ëŁ¼
-0.14
sns
-0.14
POSITIVE LOGITS
Iraqi
0.46
Iraq
0.44
Baghdad
0.44
Iraq
0.38
Mosul
0.37
اÙĦعراÙĤ
0.32
عراÙĤ
0.31
Saddam
0.30
Kirk
0.29
Babylon
0.29
Activations Density 0.088%