INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ة
    1.07
    il
    1.04
    Y
    0.95
    ر
    0.94
    ing
    0.89
    Z
    0.85
    ан
    0.84
    iraj
    0.82
    ل
    0.80
    вав
    0.79
    POSITIVE LOGITS
    t
    1.38
    '
    1.16
    y
    1.12
    p
    1.05
    to
    1.02
    f
    1.02
    0.94
    0.92
     to
    0.87
    но
    0.87
    Act Density 0.208%

    No Known Activations