INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bein
    -0.08
     enim
    -0.08
     hong
    -0.08
     سبحانه
    -0.08
    Ghost
    -0.07
    lund
    -0.07
     wu
    -0.07
     Beine
    -0.07
     хам
    -0.07
    。因此
    -0.07
    POSITIVE LOGITS
     approxim
    0.09
     sufficiently
    0.09
     approximation
    0.09
    _PREC
    0.09
     suficientemente
    0.08
    illow
    0.08
    precision
    0.08
     cuidad
    0.08
     aproxima
    0.08
    ows
    0.08
    Act Density 0.043%

    No Known Activations