INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     overweight
    -0.07
     dispro
    -0.06
    _scheme
    -0.06
     devis
    -0.06
     Wellington
    -0.06
     ward
    -0.06
     Kevin
    -0.06
    Name
    -0.06
     pounds
    -0.06
    arrass
    -0.06
    POSITIVE LOGITS
    avn
    0.08
    .absolute
    0.07
    #ga
    0.07
     Using
    0.07
     hòa
    0.07
     begging
    0.07
    -len
    0.06
     ولكن
    0.06
     развити
    0.06
    0.06
    Act Density 0.049%

    No Known Activations