INDEX
    Explanations

    mentions of politics and corruption

    New Auto-Interp
    Negative Logits
     Cancel
    -0.90
    actory
    -0.84
    imates
    -0.77
    wolves
    -0.74
    oa
    -0.72
    amination
    -0.71
    tered
    -0.70
     Takeru
    -0.70
    wered
    -0.70
    olen
    -0.68
    POSITIVE LOGITS
     correctness
    1.37
    eering
    0.97
     activism
    0.90
     intrig
    0.89
     pund
    0.88
     appoint
    0.88
     rhetoric
    0.87
     clout
    0.82
     affiliation
    0.81
     affili
    0.81
    Act Density 1.441%

    No Known Activations