INDEX
    Explanations

    nam followed by suffixes

    New Auto-Interp
    Negative Logits
    1.20
    هم
    1.09
    1.07
    1.06
    1.04
    1.03
    1.02
    री
    1.01
    يس
    1.01
    ियों
    1.00
    POSITIVE LOGITS
    u
    1.86
    o
    1.73
    h
    1.70
    e
    1.41
    y
    1.34
    z
    1.22
    i
    1.13
    have
    1.11
    p
    1.09
    t
    1.05
    Act Density 0.000%

    No Known Activations