INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.61
     Θ
    0.59
     کرنا
    0.59
    0.59
     новой
    0.57
     शाळे
    0.57
     szkoły
    0.57
     khí
    0.56
     Π
    0.55
     هنر
    0.55
    POSITIVE LOGITS
    as
    0.80
    r
    0.68
    u
    0.63
    ing
    0.61
    s
    0.54
    0.53
    st
    0.53
    AT
    0.51
    EA
    0.49
    بيه
    0.48
    Act Density 0.015%

    No Known Activations