INDEX
Explanations
references to Iraq and related political figures or events
New Auto-Interp
Negative Logits
providedIn
-0.83
Gifford
-0.74
Ves
-0.74
Hern
-0.73
rosen
-0.73
Kul
-0.72
Gat
-0.72
Guan
-0.72
skating
-0.71
Kari
-0.71
POSITIVE LOGITS
Iraq
1.88
Iraq
1.73
Iraqi
1.49
Baghdad
1.32
Ira
1.16
Saddam
1.16
Bagdad
1.10
Irak
1.09
عراق
1.05
Irak
0.98
Activations Density 0.002%