INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    0.76
     आणि
    0.71
     ਅਤੇ
    0.70
    했고
    0.66
     এবং
    0.66
    시고
    0.61
     και
    0.59
     ಮತ್ತು
    0.59
     ۽
    0.57
     as
    0.57
    POSITIVE LOGITS
    لى
    0.69
     kunnen
    0.63
     họ
    0.61
     hanno
    0.58
     trên
    0.56
    ى
    0.56
     onlar
    0.56
     têm
    0.55
    0.54
    َى
    0.54
    Act Density 0.033%

    No Known Activations