INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ا
    2.33
    های
    2.02
    larda
    2.02
    ı
    2.02
    ü
    1.88
    fh
    1.84
    𝘬
    1.84
     وفي
    1.81
    অথ
    1.79
    lardı
    1.79
    POSITIVE LOGITS
    d
    2.19
    ine
    2.16
    2.08
    STR
    1.96
    %>%
    1.84
    ir
    1.84
    s
    1.81
    al
    1.77
    urface
    1.77
    am
    1.76
    Act Density 0.626%

    No Known Activations