INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    س
    1.34
    ICK
    1.30
    ي
    1.27
    EN
    1.23
     a
    1.16
    to
    1.15
    İN
    1.14
     as
    1.13
    ،
    1.02
    IST
    0.98
    POSITIVE LOGITS
    an
    1.33
    ä
    1.20
    in
    1.07
    ing
    1.02
    ת
    0.97
    ుడు
    0.94
    ים
    0.93
    (
    0.93
    ির
    0.92
    0.91
    Act Density 0.022%

    No Known Activations