INDEX
    Explanations

    processing text, images, data

    New Auto-Interp
    Negative Logits
    ır
    1.01
    ری
    0.94
    .
    0.92
    ot
    0.88
    ני
    0.87
     be
    0.81
    ատ
    0.80
    ного
    0.79
    ید
    0.79
    0.78
    POSITIVE LOGITS
    T
    1.24
    د
    1.21
    ↵↵
    1.12
    ع
    1.12
    d
    1.05
    f
    1.00
    V
    1.00
    b
    0.98
    0.96
    A
    0.95
    Act Density 0.071%

    No Known Activations