INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ла
    0.66
    ты
    0.66
    (
    0.62
    ти
    0.61
    0.58
     stockings
    0.54
    νει
    0.52
     sparsim
    0.52
     mishaps
    0.51
    ва
    0.50
    POSITIVE LOGITS
     is
    0.86
    d
    0.72
     was
    0.72
     de
    0.66
     an
    0.66
     que
    0.64
     à
    0.64
     are
    0.63
     ה
    0.62
    0.61
    Act Density 1.150%

    No Known Activations