INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    1.48
    ב
    1.38
    1.30
    З
    1.29
    У
    1.28
    1.26
     alleys
    1.23
    ה
    1.22
     on
    1.21
    ور
    1.21
    POSITIVE LOGITS
    r
    1.29
    ite
    1.27
    ip
    1.17
    they
    1.17
    .
    1.16
    1.16
    '
    1.13
     (
    1.09
    ate
    1.08
    ud
    1.06
    Act Density 0.012%

    No Known Activations