INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     depressing
    -0.10
    ами
    -0.09
    	draw
    -0.08
     инт
    -0.08
    Uc
    -0.08
     reduct
    -0.08
     notor
    -0.08
     дела
    -0.08
     degeneration
    -0.07
    	Map
    -0.07
    POSITIVE LOGITS
    b
    0.08
    Laser
    0.07
    0.07
    SOS
    0.07
     Secretary
    0.07
     potable
    0.07
     Oro
    0.07
    ([...
    0.07
     Institute
    0.07
     Compact
    0.07
    Act Density 0.010%

    No Known Activations