INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     repeatedly
    -0.07
    бат
    -0.07
     несколько
    -0.07
     recommend
    -0.07
    (v
    -0.07
     justamente
    -0.07
    ik
    -0.07
    ик
    -0.07
     recommends
    -0.07
    -0.07
    POSITIVE LOGITS
     unchanged
    0.14
     changed
    0.10
     weiterhin
    0.10
     υπάρ
    0.10
     기존
    0.09
     변경
    0.09
    unch
    0.09
    /change
    0.09
     fundamentally
    0.08
     edelleen
    0.08
    Act Density 0.058%

    No Known Activations