INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     surrendered
    -0.07
    -0.06
     testified
    -0.06
    ужд
    -0.06
     near
    -0.06
     package
    -0.06
     summ
    -0.06
    -risk
    -0.06
    "},{"
    -0.06
     einen
    -0.06
    POSITIVE LOGITS
    =__
    0.07
     memnun
    0.07
    (heap
    0.07
     loggedIn
    0.07
    _call
    0.06
    itesse
    0.06
    ycl
    0.06
    	mask
    0.06
     cabo
    0.06
    ắng
    0.06
    Act Density 0.017%

    No Known Activations