INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     одном
    -1.04
    someone
    -1.02
    何か
    -1.00
    postolic
    -0.99
    numb
    -0.91
     reçoit
    -0.90
     qualcosa
    -0.90
     iemand
    -0.90
    ceptable
    -0.89
     indicando
    -0.89
    POSITIVE LOGITS
     a
    4.84
     an
    3.28
     একটি
    1.72
    是一个
    1.66
    一个
    1.63
     một
    1.44
    是个
    1.44
     eine
    1.41
    了一個
    1.40
     एक
    1.38
    Act Density 0.267%

    No Known Activations