INDEX
    Explanations

    words related to political partisanship

    references to partisan politics

    New Auto-Interp
    Negative Logits
    ternally
    -0.80
    ofi
    -0.80
    worm
    -0.79
    uras
    -0.78
    ept
    -0.77
    enium
    -0.76
    uran
    -0.76
    ees
    -0.75
    worms
    -0.74
    ulet
    -0.74
    POSITIVE LOGITS
     affiliation
    0.93
     partisans
    0.90
     partisan
    0.89
     affili
    0.85
     leaning
    0.83
     bias
    0.81
     politics
    0.77
     loyalty
    0.77
     persuasion
    0.77
     correctness
    0.76
    Act Density 0.040%

    No Known Activations