INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    itters
    -0.07
    untary
    -0.07
    انيا
    -0.07
     바로
    -0.07
    cribes
    -0.06
     kinetics
    -0.06
     microbes
    -0.06
     Çünkü
    -0.06
    ZO
    -0.06
    Captain
    -0.06
    POSITIVE LOGITS
     velkou
    0.07
     risult
    0.07
     advancement
    0.07
    	cb
    0.06
     vo
    0.06
    _SPI
    0.06
     utter
    0.06
     puis
    0.06
    mir
    0.06
    	page
    0.06
    Act Density 0.019%

    No Known Activations