INDEX
    Explanations

    male/female

    New Auto-Interp
    Negative Logits
    bullet
    -0.08
     impro
    -0.08
     improvis
    -0.07
    ,比如
    -0.07
     destacados
    -0.07
     discre
    -0.07
     obligator
    -0.07
     precisa
    -0.07
    ował
    -0.07
    neq
    -0.07
    POSITIVE LOGITS
     Raz
    0.09
    .statistics
    0.08
     Wider
    0.08
     Rij
    0.07
    (API
    0.07
     fet
    0.07
     Bent
    0.07
     Bots
    0.07
    urred
    0.07
    ој
    0.07
    Act Density 0.013%

    No Known Activations