INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    1.45
     on
    1.41
    ü
    1.34
    י
    1.32
    ı
    1.26
     l
    1.23
    n
    1.23
    ي
    1.22
     o
    1.20
    q
    1.18
    POSITIVE LOGITS
    (
    1.48
    ους
    1.23
    cidos
    1.20
    س
    1.20
    кові
    1.06
    ش
    1.05
    ра
    1.03
    يك
    1.01
    ўцаў
    1.00
    ی
    1.00
    Act Density 0.001%

    No Known Activations