INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    י
    1.15
    ികി
    1.13
    1.13
    یی
    1.08
     are
    1.04
    لە
    1.00
    0.99
    ки
    0.98
    יש
    0.94
    ।)
    0.92
    POSITIVE LOGITS
    m
    1.26
    '
    1.26
    ing
    1.19
    {
    1.14
    1.13
    1.09
    .
    1.08
    কে
    1.04
    f
    1.02
    _
    1.01
    Act Density 0.016%

    No Known Activations