INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    emer
    -0.07
     khối
    -0.06
     shifted
    -0.06
     Integrity
    -0.06
     trium
    -0.06
    -Pack
    -0.06
    .mvp
    -0.06
    	Server
    -0.06
    pga
    -0.06
     suicide
    -0.06
    POSITIVE LOGITS
    .getvalue
    0.07
    яться
    0.07
    .UN
    0.07
     Postal
    0.06
    eresa
    0.06
     ric
    0.06
     Toby
    0.06
     Ч
    0.06
    0.06
    Ñ
    0.06
    Act Density 0.009%

    No Known Activations