INDEX
    Explanations

    mathematical expressions

    New Auto-Interp
    Negative Logits
    ır
    1.27
    {
    1.10
    AD
    1.09
     are
    1.08
     is
    1.02
    aries
    1.01
    ae
    1.00
    0.97
    هُ
    0.97
    0.96
    POSITIVE LOGITS
    1.76
    ן
    1.44
    s
    1.30
    1.11
    ่า
    1.09
    יי
    1.05
    1.05
    1.05
    r
    1.04
     expression
    1.02
    Act Density 0.013%

    No Known Activations