INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     longstanding
    -0.07
     Vak
    -0.07
    Thin
    -0.06
     Bak
    -0.06
     loginUser
    -0.06
     BMP
    -0.06
    	glog
    -0.06
     лю
    -0.06
    ۴
    -0.06
    _leave
    -0.06
    POSITIVE LOGITS
    .'<
    0.07
     steer
    0.07
    hamster
    0.07
    cr
    0.07
    _inverse
    0.07
     خارجی
    0.06
    0.06
     cravings
    0.06
     crust
    0.06
    .LINE
    0.06
    Act Density 0.065%

    No Known Activations