INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    1.40
    ä
    1.15
    1.13
    ها
    1.13
    t
    1.10
    ا
    1.06
    et
    1.05
    د
    1.05
    ek
    1.04
    (
    1.04
    POSITIVE LOGITS
     بير
    1.02
     ني
    1.01
    <0xF3>
    1.00
     وي
    1.00
    ;
    1.00
    يره
    0.99
     غير
    0.98
     quell
    0.98
     وين
    0.98
     ويت
    0.98
    Act Density 0.005%

    No Known Activations