INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Ét
    -0.08
    от
    -0.08
    –↵
    -0.07
    inted
    -0.07
    Exper
    -0.07
     steaming
    -0.07
    charg
    -0.07
    angar
    -0.07
    кот
    -0.07
    Ng
    -0.07
    POSITIVE LOGITS
     sides
    0.10
     πλευ
    0.10
     equally
    0.09
    Side
    0.09
     alike
    0.09
     Seiten
    0.08
     стороны
    0.08
    0.08
    0.08
    Sides
    0.08
    Act Density 0.026%

    No Known Activations