INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    2.23
    k
    1.77
    nar
    1.73
    ERS
    1.70
    mselves
    1.62
    sy
    1.57
    nin
    1.56
    nk
    1.56
     pudd
    1.55
    nings
    1.51
    POSITIVE LOGITS
    ل
    3.30
    ج
    2.38
    گ
    2.20
    ש
    2.02
    т
    1.96
    и
    1.94
    ك
    1.90
    м
    1.89
    то
    1.88
    ان
    1.88
    Act Density 0.446%

    No Known Activations