INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    1.17
    ot
    1.08
    ren
    1.04
    1.04
    0.96
    are
    0.94
    2
    0.93
    ances
    0.90
    7
    0.90
    OR
    0.86
    POSITIVE LOGITS
    f
    1.23
    ה
    1.22
    ه
    1.13
    n
    1.05
    ع
    0.95
     beho
    0.94
    ли
    0.93
    ת
    0.90
    عين
    0.87
    a
    0.86
    Act Density 0.013%

    No Known Activations