INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     szybko
    -0.89
     Premises
    -0.79
    zeniu
    -0.77
     везе
    -0.77
    пти
    -0.75
     négy
    -0.74
    gencia
    -0.72
    WaitFor
    -0.72
    zeme
    -0.71
    ługa
    -0.71
    POSITIVE LOGITS
    rightarrow
    1.72
     →
    1.27
     to
    1.16
    1.16
     đến
    1.00
     into
    0.89
     droite
    0.87
     ->
    0.86
    longrightarrow
    0.85
    ->
    0.81
    Act Density 0.017%

    No Known Activations