INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     возникает
    -0.08
    uncia
    -0.07
     이루
    -0.07
    plash
    -0.07
    834
    -0.07
     Tight
    -0.07
     layoffs
    -0.06
     Sullivan
    -0.06
    )size
    -0.06
     surrender
    -0.06
    POSITIVE LOGITS
    _des
    0.07
     lint
    0.07
     пак
    0.07
     equipments
    0.06
    ('.')
    0.06
     goofy
    0.06
     Verb
    0.06
    0.06
     cookie
    0.06
    (cs
    0.06
    Act Density 0.115%

    No Known Activations