INDEX
    Explanations

    words or phrases related to politics

    references to political concepts or discussions

    New Auto-Interp
    Negative Logits
    olen
    -0.83
    xt
    -0.80
    eret
    -0.80
    hem
    -0.79
    IER
    -0.78
    oning
    -0.77
    imus
    -0.76
     Cancel
    -0.75
    alin
    -0.74
    cellent
    -0.72
    POSITIVE LOGITS
     correctness
    1.24
     persuasion
    1.04
     affili
    0.97
     affiliation
    0.97
     activism
    0.97
     affairs
    0.92
     satire
    0.91
     fallout
    0.91
     partisans
    0.91
     ideology
    0.90
    Act Density 0.033%

    No Known Activations