INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    1.35
    פ
    1.10
    1.09
    هم
    1.09
    ת
    1.08
    ل
    1.05
    "
    1.05
    ז
    1.04
    ה
    1.02
    сат
    0.97
    POSITIVE LOGITS
    in
    1.41
    not
    1.10
    i
    1.06
    ي
    0.98
    os
    0.91
    de
    0.91
    inį
    0.90
    Olá
    0.87
     I
    0.85
    as
    0.81
    Act Density 0.007%

    No Known Activations