INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ibus
    -0.08
     dedi
    -0.07
     sze
    -0.07
     assigning
    -0.07
    Ano
    -0.07
     prej
    -0.07
     eman
    -0.07
    itelj
    -0.07
    -0.07
     rtn
    -0.07
    POSITIVE LOGITS
     ka
    0.09
     Cooper
    0.09
     kilogram
    0.08
     полож
    0.08
     zinc
    0.08
     tactics
    0.07
    eroon
    0.07
    ka
    0.07
     প্রতি
    0.07
     uz
    0.07
    Act Density 0.061%

    No Known Activations