INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     После
    0.77
     Після
    0.77
     Ту
    0.76
     То
    0.74
     Многие
    0.74
     Смо
    0.73
     Ио
    0.73
     Когда
    0.72
     Там
    0.72
     adimensional
    0.71
    POSITIVE LOGITS
    ed
    1.11
    ت
    1.10
    ing
    1.04
    í
    0.91
     are
    0.88
     as
    0.86
    ط
    0.86
    os
    0.84
    un
    0.82
    ING
    0.82
    Act Density 0.006%

    No Known Activations