INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    с
    1.80
    س
    1.32
    ور
    1.20
    1.15
    ينا
    1.14
    1.13
    ня
    1.06
    ի
    1.06
    н
    1.05
    1.04
    POSITIVE LOGITS
    t
    2.44
    ?
    1.44
    l
    1.36
    w
    1.34
    ti
    1.34
    \
    1.31
    r
    1.27
    tr
    1.25
    tive
    1.25
    ts
    1.24
    Act Density 0.001%

    No Known Activations