INDEX
    Explanations

    phrases related to politics and government

    New Auto-Interp
    Negative Logits
    bender
    -0.78
    puff
    -0.76
    wic
    -0.73
    yang
    -0.69
    wrap
    -0.69
    FU
    -0.68
     Levine
    -0.68
    quart
    -0.67
    tar
    -0.66
     Shapiro
    -0.65
    POSITIVE LOGITS
    selves
    1.32
     own
    1.20
     ancestors
    1.16
     nation
    1.04
     selves
    1.03
     beloved
    1.02
     collective
    1.01
     hearts
    1.01
     asses
    0.96
     shores
    0.95
    Act Density 0.126%

    No Known Activations