INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Medical
    -0.06
     oyuncu
    -0.06
    peater
    -0.06
     trường
    -0.06
     kleinen
    -0.06
    ٢
    -0.06
     tube
    -0.06
     sexuality
    -0.06
     Cycle
    -0.06
    uckle
    -0.06
    POSITIVE LOGITS
     зав
    0.07
     上海
    0.07
    .Imp
    0.06
     embarrass
    0.06
    0.06
    нерг
    0.06
     Mathematics
    0.06
    мер
    0.06
    	Close
    0.06
    uffers
    0.06
    Act Density 0.003%

    No Known Activations