INDEX
    Explanations

    rearing and rearranging

    New Auto-Interp
    Negative Logits
    ي
    1.94
    י
    1.88
    ت
    1.48
    ה
    1.47
    ם
    1.41
    ه
    1.24
    מ
    1.20
    ۰
    1.14
    i
    1.13
     it
    1.09
    POSITIVE LOGITS
     (
    1.36
    at
    1.12
    ar
    1.12
    ach
    1.03
    og
    1.02
    art
    1.01
    il
    0.98
    ers
    0.96
    ast
    0.95
    ى
    0.95
    Act Density 0.002%

    No Known Activations