INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     para
    -0.07
    /use
    -0.07
    /em
    -0.07
    ify
    -0.07
     focus
    -0.07
     followed
    -0.07
     Palestine
    -0.06
     vell
    -0.06
    yle
    -0.06
    alyze
    -0.06
    POSITIVE LOGITS
     gabi
    0.09
    0.09
     negat
    0.09
    ന്റെ
    0.08
     Lief
    0.08
     dạng
    0.08
     negatif
    0.08
     románt
    0.08
    ్డ
    0.08
     Negative
    0.08
    Act Density 0.034%

    No Known Activations