INDEX
    Explanations

    female pronouns and words like daughter and husband that refer to women

    New Auto-Interp
    Negative Logits
     her
    -4.25
    her
    -2.48
     hers
    -2.25
    彼女の
    -2.13
     herself
    -2.11
    她的
    -2.05
     그녀
    -2.02
     hennes
    -2.00
     haar
    -1.91
     ее
    -1.89
    POSITIVE LOGITS
     betreft
    0.63
     maxn
    0.62
     socialista
    0.59
    pushd
    0.57
    rawan
    0.56
     elä
    0.56
     görünü
    0.56
     vettoriale
    0.56
    drawSprites
    0.56
     tiegħ
    0.55
    Act Density 4.725%

    No Known Activations