INDEX
    Explanations

    Punctuation and stop words

    New Auto-Interp
    Negative Logits
     درجه
    -0.08
    ż
    -0.07
     mar
    -0.06
    imity
    -0.06
    شاء
    -0.06
    _prom
    -0.06
     thief
    -0.06
     sina
    -0.06
     možná
    -0.06
    vh
    -0.06
    POSITIVE LOGITS
     INTERNAL
    0.07
    yaw
    0.06
    .INTERNAL
    0.06
    UREMENT
    0.06
     empty
    0.06
    aiser
    0.06
    /callback
    0.06
    ."""
    0.06
    =============
    0.06
     Ip
    0.06
    Act Density 0.052%

    No Known Activations