INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     your
    0.70
    Your
    0.64
    你的
    0.64
    your
    0.63
     yourself
    0.62
    yourself
    0.56
     Your
    0.55
     a
    0.53
    يدك
    0.49
     youre
    0.49
    POSITIVE LOGITS
     ourselves
    2.11
     хотим
    1.30
    如果我们
    1.30
     نحن
    1.29
     можем
    1.27
     우리는
    1.26
     dobbiamo
    1.23
    当我们
    1.23
     nossas
    1.23
     знаем
    1.23
    Act Density 0.100%

    No Known Activations