INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    m
    1.45
    al
    1.25
    d
    1.21
    et
    1.20
    ah
    1.20
    UN
    1.13
    os
    1.13
    ன்
    1.12
    ن
    1.10
    س
    1.10
    POSITIVE LOGITS
    ти
    1.70
    ti
    1.26
    ца
    1.20
    tl
    1.10
    τα
    1.05
    tm
    1.02
    ונה
    1.01
    ков
    0.99
    and
    0.97
    τι
    0.97
    Act Density 0.004%

    No Known Activations