INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     معرفی
    -0.08
    ères
    -0.08
     mots
    -0.08
     featured
    -0.08
    ,两
    -0.08
     мақ
    -0.08
    ың
    -0.08
     služ
    -0.08
    edish
    -0.08
    éro
    -0.07
    POSITIVE LOGITS
     insignificant
    0.09
    _generic
    0.08
     improperly
    0.08
    addin
    0.08
     profundamente
    0.08
    kernel
    0.07
    γμα
    0.07
     enriched
    0.07
     uncommon
    0.07
    city
    0.07
    Act Density 0.001%

    No Known Activations