INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ید
    1.03
    ف
    1.00
    س
    0.93
    ния
    0.92
     are
    0.90
    ır
    0.88
    ג
    0.84
    ני
    0.84
     pasada
    0.82
    ш
    0.82
    POSITIVE LOGITS
    u
    1.28
    il
    1.14
    >
    1.14
    inin
    1.07
    r
    1.05
    1.02
     for
    0.99
    0
    0.95
    ről
    0.95
    0.95
    Act Density 0.445%

    No Known Activations