INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ו
    1.34
    ни
    1.30
    <0x0D>
    1.29
    ли
    1.28
    و
    1.27
    </h2>
    1.20
    </u>
    1.12
    1.06
    1.06
    1.06
    POSITIVE LOGITS
    d
    1.53
    ↵↵
    1.39
     for
    1.33
     as
    1.24
    ق
    1.21
    ع
    1.15
    1.09
    1.09
    د
    1.06
    ول
    1.02
    Act Density 0.077%

    No Known Activations