INDEX
    Explanations

    Russian, Spanish, or Bengali single letters

    New Auto-Interp
    Negative Logits
    3.24
    ل
    2.53
    2.47
    l
    2.47
    s
    2.13
    دخل
    2.13
    2.12
    ాన్ని
    2.09
    н
    2.02
    lardan
    2.00
    POSITIVE LOGITS
    1.74
    ý
    1.73
    𝘨
    1.72
    𝘬
    1.71
    𝘦
    1.67
    ными
    1.64
    ly
    1.63
    ছিলেন
    1.62
    1.62
    𝘰
    1.61
    Act Density 0.021%

    No Known Activations