INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ना
    1.11
    ți
    1.03
    1.03
     I
    1.02
     Și
    1.01
    𝗨
    1.00
     Մ
    0.99
    0.94
    𝗟
    0.93
     Ս
    0.91
    POSITIVE LOGITS
    ти
    1.28
    ,
    1.21
    RA
    1.20
    ק
    1.08
     for
    1.07
    л
    1.06
     consistente
    1.03
    aste
    1.02
    )
    1.00
    ्स
    0.98
    Act Density 0.004%

    No Known Activations