INDEX
    Explanations

    female names and pronouns

    New Auto-Interp
    Negative Logits
     himself
    -5.00
     his
    -3.80
    himself
    -2.94
     его
    -2.88
     jego
    -2.86
     seinem
    -2.30
     himſelf
    -2.27
     그의
    -2.23
     he
    -2.13
    彼の
    -2.09
    POSITIVE LOGITS
     herself
    5.66
     her
    3.55
     she
    3.34
    herself
    3.31
     ее
    2.72
     její
    2.64
     करती
    2.56
    2.56
     ihrem
    2.41
     её
    2.33
    Act Density 0.159%

    No Known Activations