INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ä
    1.52
     is
    1.38
    ing
    1.07
    was
    0.89
    isk
    0.83
    0.83
    ik
    0.82
    t
    0.82
    rie
    0.80
     synthes
    0.80
    POSITIVE LOGITS
    ו
    1.08
    تي
    1.05
    كار
    1.03
    斯的
    1.03
    ك
    1.03
    ция
    1.02
    تك
    1.00
    ٥
    0.96
    ్‌
    0.95
    দের
    0.92
    Act Density 0.378%

    No Known Activations