INDEX
Explanations
mentions of oppressive leaders or regimes, especially those referred to as dictators
references to authoritarian leaders and regimes
New Auto-Interp
Negative Logits
alle
-0.80
IGH
-0.80
awks
-0.78
older
-0.78
atha
-0.76
WARD
-0.71
Lear
-0.70
ttp
-0.70
forth
-0.70
Sense
-0.70
POSITIVE LOGITS
dictator
1.20
dictatorship
0.99
dictators
0.98
overth
0.87
Hussein
0.83
decree
0.80
Saddam
0.79
tyrant
0.78
nomine
0.78
regimes
0.77
Activations Density 0.024%