INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     معرفی
    -0.07
    _idle
    -0.07
    worker
    -0.07
     προ
    -0.06
    anyahu
    -0.06
     lik
    -0.06
     contraseña
    -0.06
    زمان
    -0.06
     неприят
    -0.06
     yandan
    -0.06
    POSITIVE LOGITS
    elerine
    0.07
    erb
    0.07
    Luckily
    0.07
    ming
    0.07
    	en
    0.07
    ْب
    0.06
    erties
    0.06
    having
    0.06
    Fortunately
    0.06
    0.06
    Act Density 0.017%

    No Known Activations