INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sabot
    -0.09
     القد
    -0.09
     pissed
    -0.09
     sédu
    -0.08
     Elite
    -0.08
     tuer
    -0.08
    jene
    -0.08
     controvers
    -0.08
     verlaten
    -0.08
     spectaculaire
    -0.08
    POSITIVE LOGITS
     intervals
    0.09
     M
    0.08
    -inter
    0.07
    Intervals
    0.07
     satisfying
    0.07
     ит
    0.07
    -f
    0.07
    Interval
    0.07
     ]];
    0.07
     reunión
    0.07
    Act Density 0.002%

    No Known Activations