INDEX
    Explanations

    names followed by associated terms

    New Auto-Interp
    Negative Logits
    1.16
    ه
    1.09
    1.04
    %
    1.01
    د
    1.00
    0.99
    h
    0.98
    _
    0.97
    0.97
     or
    0.96
    POSITIVE LOGITS
    𝐨
    1.24
    تری
    1.10
    𝐢
    1.09
    تين
    1.08
    1
    1.06
    1.05
     تقديم
    1.05
    𝐜
    1.05
     магнит
    1.03
    ت
    1.03
    Act Density 0.005%

    No Known Activations