INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    2.00
    t
    1.41
    is
    1.34
    as
    1.34
    u
    1.31
    o
    1.30
    il
    1.27
    iato
    1.24
    ма
    1.23
    it
    1.22
    POSITIVE LOGITS
    U
    1.26
    1.13
    С
    1.11
     ي
    1.09
     ánh
    1.09
    ד
    1.09
     ül
    1.05
    O
    1.02
    З
    1.02
    1.01
    Act Density 0.035%

    No Known Activations