INDEX
    Explanations

    of and word

    New Auto-Interp
    Negative Logits
    ẳn
    -0.08
    iado
    -0.06
     совершенно
    -0.06
     dạy
    -0.06
    -rock
    -0.06
     thêm
    -0.06
     cả
    -0.06
    mayı
    -0.06
     іншого
    -0.06
    ňují
    -0.06
    POSITIVE LOGITS
     WORD
    0.09
     word
    0.07
    -word
    0.07
    0.06
     WH
    0.06
    (EVENT
    0.06
    oise
    0.06
     Lin
    0.06
     інт
    0.06
     notice
    0.06
    Act Density 0.011%

    No Known Activations