INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     आफ्नो
    0.47
    自分の
    0.44
    回事
    0.43
     자신의
    0.41
    我们会
    0.41
     ನನಗೆ
    0.41
    小编
    0.38
    Estoy
    0.37
     swoich
    0.37
     mnie
    0.37
    POSITIVE LOGITS
     you
    2.05
     você
    1.90
     YOU
    1.82
     شما
    1.80
    you
    1.73
     bạn
    1.72
     आप
    1.63
     You
    1.59
    คุณ
    1.59
    YOU
    1.59
    Act Density 0.064%

    No Known Activations