INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     welcome
    -0.07
     rebate
    -0.06
    스는
    -0.06
     ولد
    -0.06
    -0.06
    Airport
    -0.06
     Saturday
    -0.06
    -hero
    -0.06
     Encryption
    -0.06
    _tag
    -0.06
    POSITIVE LOGITS
    .bi
    0.07
     بواسطة
    0.06
     interpolated
    0.06
    wnd
    0.06
     laugh
    0.06
     ios
    0.06
    .gsub
    0.06
    ughs
    0.06
    ENT
    0.06
    ีพ
    0.06
    Act Density 0.008%

    No Known Activations