INDEX
    Explanations

    instances of certain Russian nouns

    New Auto-Interp
    Negative Logits
    s
    -0.54
    sut
    -0.27
    Ùĩ
    -0.27
    sheets
    -0.24
    sah
    -0.24
    Ñĭ
    -0.24
    sik
    -0.22
    sak
    -0.21
    sÃŃ
    -0.21
    à¸Ĺ
    -0.20
    POSITIVE LOGITS
    ÅĽci
    0.18
    ÑĤеÑģÑĮ
    0.16
    naire
    0.15
    ONSE
    0.15
    itched
    0.15
    ÌĨ
    0.15
    Ñıм
    0.15
    Ñıми
    0.15
    cpy
    0.14
    cles
    0.14
    Act Density 0.036%

    No Known Activations