INDEX
    Explanations

    references to specific public figures and political context

    New Auto-Interp
    Negative Logits
    abbit
    -0.18
     Alonso
    -0.17
     Alan
    -0.17
    -al
    -0.17
    alg
    -0.17
    alm
    -0.17
     Alabama
    -0.17
    abb
    -0.16
    aal
    -0.16
     Alban
    -0.16
    POSITIVE LOGITS
    cheid
    0.15
     B
    0.14
    .Bad
    0.14
     Ch
    0.14
    auss
    0.14
     BCH
    0.14
     !***
    0.14
    áºŃu
    0.14
     Ðij
    0.14
    ÂłB
    0.14
    Act Density 0.061%

    No Known Activations