INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    </span>
    1.07
    t
    1.06
     f
    0.89
     an
    0.88
    ка
    0.88
     on
    0.88
    </h2>
    0.87
     o
    0.84
     t
    0.84
     '
    0.84
    POSITIVE LOGITS
    ي
    1.60
    י
    1.30
     брать
    1.13
     thoroughly
    0.95
    0.95
    i
    0.95
    ق
    0.95
    at
    0.94
    0.93
    ين
    0.91
    Act Density 0.131%

    No Known Activations