INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    g
    1.01
    '/
    1.00
    '$
    0.95
    0.93
    '>");
    0.91
    0.91
    ../../
    0.89
     Greater
    0.89
    '{
    0.87
    ج
    0.85
    POSITIVE LOGITS
    𝓁
    1.11
     таке
    1.06
     quirk
    1.04
    ТЕ
    1.00
    𝑜
    0.99
     devil
    0.98
     подходит
    0.97
    𝑢
    0.97
     domać
    0.94
    𝒸
    0.93
    Act Density 0.016%

    No Known Activations