INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ة
    2.04
    я
    2.04
    ed
    1.94
    ο
    1.71
    ם
    1.69
     thousand
    1.61
    an
    1.59
    iyeti
    1.55
    ה
    1.54
    கோ
    1.52
    POSITIVE LOGITS
    𝗔
    2.06
    𝐀
    1.90
    𝗗
    1.81
    preferences
    1.78
    情況
    1.77
     subordinate
    1.77
    𝐃
    1.74
     الهمزه
    1.74
    <unused2164>
    1.73
    1.73
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.