INDEX
    Explanations

    words associated with various forms of disenfranchisement and discontent

    New Auto-Interp
    Negative Logits
    šem
    -0.15
    elson
    -0.15
    à¸Ĺà¸Ńà¸ĩ
    -0.14
    zet
    -0.14
    sto
    -0.14
    Ñħи
    -0.14
     å¯Į
    -0.14
    ceptors
    -0.14
    abra
    -0.14
    kinson
    -0.14
    POSITIVE LOGITS
    /dis
    0.21
    (dis
    0.20
     dis
    0.20
     Dis
    0.20
    Dis
    0.18
    -dis
    0.18
    .dis
    0.17
    zung
    0.17
    enance
    0.16
     DIS
    0.16
    Act Density 0.038%

    No Known Activations