INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ра
    1.23
    قي
    1.21
    ني
    1.20
    يز
    1.20
    صميم
    1.13
    0
    1.13
    ضي
    1.13
     مي‌
    1.12
     في
    1.11
    𝟎
    1.10
    POSITIVE LOGITS
    1.70
    n
    1.63
    1.30
    an
    1.20
    ل
    1.19
    i
    1.15
    l
    1.15
    ली
    1.13
    ת
    1.11
     with
    1.08
    Act Density 0.049%

    No Known Activations