INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    0.79
    𝗡
    0.73
     in
    0.72
    ukone
    0.68
    0.68
    𝗔
    0.68
    𝗛
    0.66
    𝗘
    0.66
    0.66
    𝗧
    0.66
    POSITIVE LOGITS
    ية
    0.81
    					
    0.72
     a
    0.70
    q
    0.70
    (
    0.69
    0.68
    ка
    0.68
     que
    0.67
    z
    0.66
    ста
    0.65
    Act Density 0.016%

    No Known Activations