INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ٹ
    2.48
     النظر
    2.00
     `<`,
    1.79
    ')$
    1.73
    𝗿
    1.71
     valour
    1.70
    ₁.
    1.69
     acabó
    1.68
    ות
    1.67
     იგი
    1.66
    POSITIVE LOGITS
    dan
    2.19
    re
    1.72
    d
    1.66
    il
    1.66
    sto
    1.65
    nd
    1.64
    ynski
    1.63
    peut
    1.60
    ll
    1.57
    ning
    1.56
    Act Density 0.001%

    No Known Activations