INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ها
    1.27
    1.18
    1.14
    ך
    1.13
     are
    1.09
    1.08
    はなく
    1.06
    ۔
    1.04
     
    1.03
    ли
    1.01
    POSITIVE LOGITS
    t
    1.57
    ar
    1.49
    u
    1.30
    1.25
    dj
    1.09
    tc
    1.05
    sw
    1.02
    tól
    1.02
    sc
    1.00
    ten
    0.99
    Act Density 0.041%

    No Known Activations