INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Syl
    -0.09
     Evid
    -0.08
    &M
    -0.08
    -0.08
    pok
    -0.07
     Diz
    -0.07
     Sail
    -0.07
     prof
    -0.07
     """↵↵
    -0.07
     Ums
    -0.07
    POSITIVE LOGITS
    ircle
    0.09
     ס
    0.09
     oval
    0.08
     مرا
    0.08
     पै
    0.08
    িত্র
    0.08
     centros
    0.08
     berl
    0.07
    0.07
    âmara
    0.07
    Act Density 0.250%

    No Known Activations