INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ת
    4.05
    ي
    3.88
    ة
    3.48
    ed
    3.11
    e
    3.03
    es
    3.01
    ب
    2.97
    ه
    2.86
    т
    2.81
    tir
    2.76
    POSITIVE LOGITS
    р
    2.28
    2.28
    illum
    2.22
    δήποτε
    2.20
    𝓌
    2.08
    coords
    2.06
    1.99
    ği
    1.98
    ricted
    1.96
    acle
    1.96
    Act Density 0.139%

    No Known Activations