INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    1.54
    {
    1.13
    1.09
    ка
    1.08
    1.06
    ك
    1.03
    ري
    1.02
    <td>
    1.02
    arı
    1.02
    ;
    1.01
    POSITIVE LOGITS
    is
    1.56
    1.43
    n
    1.34
    al
    1.16
    on
    1.13
     for
    1.12
    ک
    1.10
    お客
    1.09
    i
    1.09
    it
    1.07
    Act Density 0.099%

    No Known Activations