INDEX
    Explanations

    references to political opposition and public criticism

    New Auto-Interp
    Negative Logits
    ktop
    -0.17
    yna
    -0.16
    宣
    -0.16
    amed
    -0.14
    ripp
    -0.14
    itchen
    -0.13
    ennes
    -0.13
    ien
    -0.13
    nnen
    -0.13
     pronounce
    -0.13
    POSITIVE LOGITS
    Privacy
    0.17
     Privacy
    0.16
    ohana
    0.16
     privacy
    0.16
    urum
    0.15
    squ
    0.15
     groups
    0.14
    à¸ı
    0.14
     some
    0.13
     Trim
    0.13
    Act Density 0.119%

    No Known Activations