INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    سي
    1.13
    いますが
    1.10
    ع
    1.07
    arla
    1.06
    ,
    1.06
    ile
    1.05
    ight
    1.05
    un
    1.00
    وو
    0.99
    մ
    0.99
    POSITIVE LOGITS
    ه
    1.63
    in
    1.34
    a
    1.31
    1.28
     by
    1.16
    াৰ
    1.14
    </h1>
    1.13
    kker
    1.07
    1.05
    st
    1.04
    Act Density 0.004%

    No Known Activations