INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ه
    0.73
    ו
    0.70
    ח
    0.68
    s
    0.65
    an
    0.63
    و
    0.63
    ی
    0.63
    ش
    0.61
    al
    0.59
    IT
    0.59
    POSITIVE LOGITS
    0.58
     ίδια
    0.58
     tours
    0.57
     बल्ले
    0.55
     tournée
    0.55
     considérée
    0.54
    गुना
    0.54
    0.54
     inversa
    0.54
    𝚣
    0.54
    Act Density 0.002%

    No Known Activations