INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.18
    0.98
     is
    0.97
    ↵↵
    0.95
     in
    0.95
        
    0.95
     i
    0.93
    '
    0.91
     of
    0.90
     to
    0.86
    POSITIVE LOGITS
    ي
    1.61
    at
    1.45
    ב
    1.29
    يلا
    1.26
    ле
    1.22
    يل
    1.14
    ين
    1.13
    تي
    1.13
    ви
    1.11
    1.08
    Act Density 0.042%

    No Known Activations