INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ни
    1.34
    ן
    1.24
    س
    1.19
    نا
    1.18
    с
    1.16
    ના
    1.09
    </h2>
    1.08
     are
    1.06
     It
    1.05
    ну
    1.03
    POSITIVE LOGITS
    h
    1.51
    al
    1.47
    n
    1.41
    -
    1.41
    l
    1.32
    ה
    1.30
    ل
    1.29
    ב
    1.28
    З
    1.15
    Y
    1.12
    Act Density 0.006%

    No Known Activations