INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    ------↵
    -0.07
     Parks
    -0.06
     criticism
    -0.06
     geliyor
    -0.06
     ActionListener
    -0.06
     поль
    -0.06
    ッツ
    -0.06
    -football
    -0.06
    رود
    -0.06
    POSITIVE LOGITS
    431
    0.07
     Plug
    0.06
    /pg
    0.06
    \Category
    0.06
     glColor
    0.06
    0.06
     Sms
    0.06
     Ana
    0.06
     харч
    0.06
    opleft
    0.06
    Act Density 0.004%

    No Known Activations