INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    માં
    1.38
    ،
    1.31
    1.17
     as
    1.14
    1.09
    یسر
    1.09
    1.08
    би
    1.07
     шер
    1.06
    ма
    1.05
    POSITIVE LOGITS
    ן
    1.37
    ع
    1.25
    y
    1.16
    n
    1.09
    1.08
    h
    1.05
     I
    1.01
    w
    1.01
    k
    0.95
    p
    0.94
    Act Density 0.006%

    No Known Activations