INDEX
    Explanations

    code structure or markdown formatting

    New Auto-Interp
    Negative Logits
    𝐥
    2.19
    اً
    2.03
    های
    1.95
    ००
    1.90
    ydı
    1.89
    "\
    1.87
    𝐨
    1.86
    𝐚
    1.85
    𝐫
    1.85
    ている
    1.80
    POSITIVE LOGITS
    ل
    3.66
    in
    3.57
    3.53
    us
    3.31
    ת
    3.30
    ي
    3.21
    на
    3.05
    3.00
    ি
    2.88
    м
    2.87
    Act Density 0.313%

    No Known Activations