INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    labels
    -0.08
     چنین
    -0.07
     هایی
    -0.07
    <Guid
    -0.06
    ιας
    -0.06
    Doctrine
    -0.06
    -0.06
    ربه
    -0.06
     lại
    -0.06
     Кроме
    -0.06
    POSITIVE LOGITS
     pratic
    0.07
     PROT
    0.07
     abdom
    0.07
     고객
    0.06
     disrupt
    0.06
     жен
    0.06
     Willie
    0.06
     Tart
    0.06
    (Api
    0.06
     เช
    0.06
    Act Density 0.000%

    No Known Activations