INDEX
    Explanations

    mentions of oppressive leaders or regimes, especially those referred to as dictators

    references to authoritarian leaders and regimes

    New Auto-Interp
    Negative Logits
    alle
    -0.80
    IGH
    -0.80
    awks
    -0.78
    older
    -0.78
    atha
    -0.76
    WARD
    -0.71
    Lear
    -0.70
    ttp
    -0.70
    forth
    -0.70
    Sense
    -0.70
    POSITIVE LOGITS
     dictator
    1.20
     dictatorship
    0.99
     dictators
    0.98
     overth
    0.87
     Hussein
    0.83
     decree
    0.80
     Saddam
    0.79
     tyrant
    0.78
     nomine
    0.78
     regimes
    0.77
    Act Density 0.024%

    No Known Activations