INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Frog
    -0.07
    uido
    -0.06
     Daten
    -0.06
     Teacher
    -0.06
    之一
    -0.06
     sayı
    -0.06
    (process
    -0.06
    ucson
    -0.06
    ENCES
    -0.06
    BUF
    -0.06
    POSITIVE LOGITS
     r
    0.07
    0.06
    سك
    0.06
    いか
    0.06
     controversial
    0.06
    _XDECREF
    0.06
    0.06
    biz
    0.06
    0.06
    .hasClass
    0.06
    Act Density 0.003%

    No Known Activations