INDEX
    Explanations

    references to politics and social issues

    New Auto-Interp
    Negative Logits
    zap
    -0.18
     suc
    -0.17
    itler
    -0.16
    chen
    -0.15
    andro
    -0.15
    ester
    -0.14
    /http
    -0.14
    holm
    -0.14
    нка
    -0.14
     bed
    -0.14
    POSITIVE LOGITS
    ugi
    0.17
    æį
    0.16
    ãĥ³ãĥIJ
    0.15
    -prepend
    0.15
    880
    0.14
    ĶåĽŀ
    0.14
    ÑģÑĤи
    0.14
    ëª
    0.14
    ikon
    0.14
     Hag
    0.14
    Act Density 0.055%

    No Known Activations