INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.79
    0.67
    。</
    0.65
    0.64
    0.63
    ۔
    0.63
    0.62
    یت
    0.62
     بہت
    0.61
    ंनी
    0.61
    POSITIVE LOGITS
    t
    1.35
    z
    1.02
     of
    1.01
    el
    0.98
    er
    0.90
    b
    0.78
    r
    0.76
    tj
    0.75
    k
    0.74
    us
    0.72
    Act Density 0.023%

    No Known Activations