INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ни
    2.17
    𝖺
    1.84
    μα
    1.81
    ви
    1.71
     elles
    1.70
    stid
    1.70
    𝗼
    1.70
    ى
    1.70
    𝗅
    1.69
    ע
    1.66
    POSITIVE LOGITS
    IN
    2.02
    EN
    1.94
    IS
    1.85
    EL
    1.85
    ET
    1.77
    ES
    1.76
    ON
    1.75
     तौर
    1.73
    ED
    1.70
    ्योर
    1.69
    Act Density 0.012%

    No Known Activations