INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    over
    1.52
    ur
    1.45
    ot
    1.41
    um
    1.38
    1
    1.36
    ong
    1.25
    otur
    1.18
    of
    1.17
    v
    1.16
    ete
    1.14
    POSITIVE LOGITS
     forward
    1.46
     вперед
    1.31
     сдела
    1.25
    सँग
    1.23
     powied
    1.19
    1.19
     débil
    1.16
     отмети
    1.12
     pregunt
    1.11
    ر
    1.10
    Act Density 0.026%

    No Known Activations