INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    𝘁
    2.77
    𝘰
    2.59
    burse
    2.57
    𝚝
    2.55
    𝘦
    2.51
    𝘳
    2.51
    𝘵
    2.50
    َّ
    2.48
    2.45
     `--
    2.45
    POSITIVE LOGITS
    м
    4.81
    ه
    4.81
    a
    4.61
    ו
    4.52
    i
    4.07
    ي
    3.97
    3.94
    ۰
    3.79
    3.75
    ی
    3.69
    Act Density 0.157%

    No Known Activations