INDEX
    Explanations

    terms related to political positions or stances

    New Auto-Interp
    Negative Logits
    aeda
    -0.15
    isay
    -0.15
    ácil
    -0.15
    pedia
    -0.15
    ôte
    -0.15
    ween
    -0.15
    lena
    -0.14
    767
    -0.14
    ongyang
    -0.14
    amedi
    -0.14
    POSITIVE LOGITS
    .FontStyle
    0.15
    IBUT
    0.15
    STA
    0.15
    ãĤ«ãĥ«
    0.14
    .xtext
    0.14
    criptor
    0.14
    ee
    0.14
    Kom
    0.14
     exemplary
    0.14
    æŁľ
    0.14
    Act Density 0.007%

    No Known Activations