INDEX
    Explanations

    references to political parties and ideologies

    New Auto-Interp
    Negative Logits
    ucher
    -0.18
    bung
    -0.18
    ãĥ³ãĤº
    -0.17
    ersh
    -0.17
    ignant
    -0.15
    prene
    -0.15
    onomy
    -0.15
    Ñĥже
    -0.15
    forman
    -0.14
    itest
    -0.14
    POSITIVE LOGITS
    zsche
    0.17
    antine
    0.17
     âĻ
    0.16
    phalt
    0.14
     jig
    0.14
    364
    0.14
    NT
    0.14
     sav
    0.14
    agg
    0.14
    lr
    0.13
    Act Density 0.025%

    No Known Activations