INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     اصول
    -0.08
     عملی
    -0.07
    ूसर
    -0.07
     JL
    -0.06
     Increment
    -0.06
    аться
    -0.06
    .FALSE
    -0.06
    ��
    -0.06
    Alter
    -0.06
    -0.06
    POSITIVE LOGITS
     chick
    0.06
    SSION
    0.06
     Pyongyang
    0.06
     di
    0.06
    livé
    0.06
     chicks
    0.06
    kid
    0.06
     referendum
    0.06
     Subway
    0.06
    probably
    0.06
    Act Density 0.010%

    No Known Activations