INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     phạt
    -0.07
     DevExpress
    -0.07
    woord
    -0.07
    achte
    -0.07
     листопада
    -0.06
    iyeti
    -0.06
     tỉnh
    -0.06
     역시
    -0.06
     scam
    -0.06
     вдруг
    -0.06
    POSITIVE LOGITS
     listBox
    0.07
    uuml
    0.07
    ��
    0.07
     inspirational
    0.06
     receiving
    0.06
    تق
    0.06
     Shame
    0.06
    organic
    0.06
    .\
    0.06
     discriminate
    0.06
    Act Density 0.002%

    No Known Activations