INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0
    1.21
    ни
    1.11
    ٠
    1.05
    ،
    1.04
     (
    1.01
    ></
    0.96
     a
    0.93
     I
    0.91
    0.90
    0.85
    POSITIVE LOGITS
    ه
    1.51
    1.30
    ה
    1.27
    in
    1.24
    a
    1.21
    هو
    1.14
    1.14
    1.13
    ช่วง
    1.12
    هي
    1.10
    Act Density 0.001%

    No Known Activations