INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    i
    1.52
    a
    1.49
    ה
    1.45
    י
    1.28
    تها
    1.22
    ி
    1.20
    ه
    1.20
    1.16
    ların
    1.15
    the
    1.13
    POSITIVE LOGITS
     at
    1.22
    ь
    1.17
    ug
    1.08
    iv
    1.03
    ικ
    1.03
    ви
    1.02
    ון
    1.02
    هم
    1.00
    िस
    0.99
    يمة
    0.98
    Act Density 0.000%

    No Known Activations