INDEX
    Explanations

    after punctuation delimiters

    New Auto-Interp
    Negative Logits
    ق
    0.75
    q
    0.73
    0.60
    3
    0.59
    0.57
    0.56
    u
    0.55
    ik
    0.54
    ان
    0.53
    م
    0.53
    POSITIVE LOGITS
     savaş
    0.63
     dakkh
    0.63
     psik
    0.59
     saddo
    0.59
     dvara
    0.59
     harmonize
    0.58
     gimnas
    0.57
     kadın
    0.56
     atlet
    0.56
     poin
    0.55
    Act Density 0.000%

    No Known Activations