INDEX
    Explanations

    terms related to archives and organizational structures

    New Auto-Interp
    Negative Logits
    sk
    -0.21
    se
    -0.18
    st
    -0.18
    l
    -0.17
    la
    -0.17
    sh
    -0.15
    ils
    -0.15
    -sk
    -0.15
    osh
    -0.15
    s
    -0.15
    POSITIVE LOGITS
    нÑĭй
    0.59
    нÑĭе
    0.57
    ное
    0.56
    наÑı
    0.55
    ной
    0.54
    нÑĭÑħ
    0.52
    нÑĥÑİ
    0.51
    нÑĭм
    0.51
    ного
    0.50
    нÑĭми
    0.47
    Act Density 0.026%

    No Known Activations