INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    IS
    1.43
    س
    1.41
    ד
    1.37
    ل
    1.30
    İ
    1.29
    أ
    1.26
    то
    1.23
    ס
    1.22
    1.18
    с
    1.13
    POSITIVE LOGITS
    ou
    1.33
    ier
    1.24
     out
    1.23
    iv
    1.16
     the
    1.14
     of
    1.09
    the
    1.05
    a
    1.05
     it
    1.04
    oung
    1.04
    Act Density 0.063%

    No Known Activations