INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     uncovered
    -0.07
     став
    -0.07
     antiviral
    -0.07
    _duplicates
    -0.07
     Royals
    -0.07
     validade
    -0.06
     insects
    -0.06
     sexist
    -0.06
     Doc
    -0.06
     Vir
    -0.06
    POSITIVE LOGITS
     ayer
    0.09
     الداخ
    0.09
     tawm
    0.08
     mash
    0.08
     დან
    0.08
     dijo
    0.08
     கூட்ட
    0.08
     beho
    0.08
    LOW
    0.08
     veldig
    0.08
    Act Density 0.015%

    No Known Activations