INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    و
    2.16
    2.13
    ל
    2.09
    ات
    1.95
    1.84
    وط
    1.70
     tradu
    1.70
    ins
    1.69
    ica
    1.69
    ोर
    1.69
    POSITIVE LOGITS
    การ
    1.75
     ihe
    1.71
     LOWER
    1.70
     ƙ
    1.66
    𝘻
    1.64
     dint
    1.63
    До
    1.62
     MARRI
    1.62
    Ди
    1.60
    1.60
    Act Density 0.003%

    No Known Activations