INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.83
    ه
    0.79
    u
    0.77
    މ
    0.75
    ת
    0.74
    عرف
    0.73
    0.73
    an
    0.73
    ان
    0.72
    a
    0.71
    POSITIVE LOGITS
    arsi
    0.56
    aded
    0.55
    ukt
    0.53
    ák
    0.52
    ICA
    0.52
    ých
    0.52
    imental
    0.51
    ige
    0.50
    rica
    0.50
     Cine
    0.48
    Act Density 0.001%

    No Known Activations