INDEX
    Explanations

    words related to literature and academic concepts

    New Auto-Interp
    Negative Logits
    ška
    -0.16
    ÑĤивного
    -0.16
    vÃŃm
    -0.15
     nimi
    -0.15
    owe
    -0.15
    imi
    -0.15
    aux
    -0.15
    LARI
    -0.14
    ivid
    -0.14
    aggi
    -0.14
    POSITIVE LOGITS
    nej
    0.37
    owej
    0.36
    cej
    0.35
    лой
    0.32
    Ñīей
    0.31
    анной
    0.31
    енной
    0.31
    ной
    0.31
    Ñģкой
    0.30
    zej
    0.30
    Act Density 0.058%

    No Known Activations