INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    uada
    -0.07
     phường
    -0.07
    Wy
    -0.07
    _ag
    -0.07
     predominant
    -0.07
    Modificar
    -0.07
     ayrıntılı
    -0.06
     Fuj
    -0.06
    ‐-
    -0.06
     кафед
    -0.06
    POSITIVE LOGITS
    less
    0.22
    LESS
    0.16
    -less
    0.12
    ess
    0.10
    LES
    0.10
    レス
    0.09
     wireless
    0.09
    iless
    0.09
    lex
    0.08
    есс
    0.08
    Act Density 0.013%

    No Known Activations