INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Johan
    -0.07
     finan
    -0.07
     نسبت
    -0.07
    Guardar
    -0.07
    iled
    -0.07
    iniz
    -0.06
    CCCCCC
    -0.06
     ولكن
    -0.06
    obic
    -0.06
     трь
    -0.06
    POSITIVE LOGITS
     dad
    0.07
     Dad
    0.07
    0.06
    0.06
    	local
    0.06
     credible
    0.06
    0.06
     따른
    0.06
     hottest
    0.06
     accidentally
    0.06
    Act Density 0.015%

    No Known Activations