INDEX
    Explanations

    understanding and design purpose

    New Auto-Interp
    Negative Logits
     obvious
    0.80
     AgNO
    0.76
    之力
    0.76
     otro
    0.74
    ся
    0.74
     unsub
    0.73
    >'
    0.73
     trifle
    0.72
     Eind
    0.72
    dom
    0.72
    POSITIVE LOGITS
    i
    0.97
    ی
    0.96
    ли
    0.87
    0.86
    י
    0.82
    𝗹
    0.79
    ভাবে
    0.79
    ி
    0.79
    ي
    0.76
    ಕಾಶ
    0.76
    Act Density 0.416%

    No Known Activations