INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     vessel
    -0.06
     chapter
    -0.06
     consumes
    -0.06
    ustom
    -0.06
    	
    -0.06
     welfare
    -0.06
     [_
    -0.06
    яет
    -0.06
     maths
    -0.06
     managerial
    -0.06
    POSITIVE LOGITS
    ادية
    0.07
    -we
    0.07
    ovně
    0.07
     BIO
    0.07
    ERING
    0.06
     scrambling
    0.06
    _ic
    0.06
     Eric
    0.06
     goofy
    0.06
    ering
    0.06
    Act Density 0.002%

    No Known Activations