INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pencils
    -0.07
     publish
    -0.06
    _wire
    -0.06
     manifest
    -0.06
    	user
    -0.06
     пол
    -0.06
     empirical
    -0.06
    _MAX
    -0.06
     journeys
    -0.06
     correspond
    -0.06
    POSITIVE LOGITS
     Köy
    0.07
     saat
    0.07
     descargar
    0.07
    (disposing
    0.07
    (hist
    0.06
     viêm
    0.06
     monday
    0.06
     War
    0.06
     Někter
    0.06
    0.06
    Act Density 0.001%

    No Known Activations