INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    0.73
    ూరు
    0.73
    0.66
    0.66
    0.65
    0.65
    𝟯
    0.65
    0.64
    𝐩
    0.64
    0.64
    POSITIVE LOGITS
    ים
    0.86
    s
    0.73
     wares
    0.70
    <0x0D>
    0.70
    en
    0.68
    ти
    0.66
    ओं
    0.66
     đôi
    0.65
    வில்
    0.64
     (“
    0.63
    Act Density 0.006%

    No Known Activations