INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tenho
    -0.08
     inc
    -0.07
     लगे
    -0.07
    itty
    -0.07
     Mun
    -0.07
     emprego
    -0.07
    YL
    -0.07
     representation
    -0.07
     participant
    -0.07
     plaintext
    -0.07
    POSITIVE LOGITS
     Irene
    0.08
     गिर
    0.07
     hemp
    0.07
     পত
    0.07
     resigned
    0.07
    _gap
    0.07
    ‍ത്ത
    0.07
    0.07
     crumble
    0.07
     pra
    0.07
    Act Density 0.001%

    No Known Activations