INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     palm
    -0.07
    Linear
    -0.07
     shorthand
    -0.07
    нем
    -0.07
    нями
    -0.06
    oufl
    -0.06
     Palm
    -0.06
     alpha
    -0.06
    semantic
    -0.06
     zřejmě
    -0.06
    POSITIVE LOGITS
     got
    0.16
     Got
    0.12
     GOT
    0.11
    Got
    0.11
    got
    0.09
     forgot
    0.08
     appropriations
    0.07
     лов
    0.07
     legit
    0.07
    орот
    0.07
    Act Density 0.018%

    No Known Activations