INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ד
    1.38
    ير
    1.24
    يه
    1.23
    يت
    1.06
    كو
    0.98
    ח
    0.97
    0.95
    يع
    0.93
    وب
    0.93
    ق
    0.92
    POSITIVE LOGITS
    at
    2.28
    as
    1.66
    ed
    1.60
    et
    1.57
    n
    1.49
    o
    1.41
    el
    1.38
    ing
    1.35
    down
    1.30
    en
    1.27
    Act Density 0.340%

    No Known Activations