INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ani
    1.03
    ת
    1.02
    ні
    0.92
    च्या
    0.87
    ных
    0.86
    ной
    0.85
    ar
    0.82
    í
    0.82
    al
    0.81
    ound
    0.80
    POSITIVE LOGITS
    م
    1.26
    B
    1.19
    L
    1.09
    1.08
    <0x0D>
    1.05
    N
    1.05
    м
    1.03
    0.98
     to
    0.97
    W
    0.96
    Act Density 0.006%

    No Known Activations