INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     aggiunto
    -0.82
    новых
    -0.77
     noo
    -0.77
     слад
    -0.76
     everybody
    -0.73
    -0.73
    دو
    -0.71
    -0.71
     вектора
    -0.71
    wußt
    -0.71
    POSITIVE LOGITS
    Cuar
    1.11
     Quar
    0.94
    qr
    0.91
     quar
    0.91
     grained
    0.81
     квар
    0.79
    分かる
    0.79
    Quar
    0.77
     Cuar
    0.77
    tering
    0.77
    Act Density 0.019%

    No Known Activations