INDEX
    Explanations

    Male pronoun

    New Auto-Interp
    Negative Logits
    _ins
    -0.07
    691
    -0.07
     parameter
    -0.07
     women
    -0.07
    .Weight
    -0.07
     prize
    -0.07
     asteroids
    -0.06
     achter
    -0.06
     IndexError
    -0.06
     Baş
    -0.06
    POSITIVE LOGITS
     his
    0.08
     seine
    0.08
     him
    0.07
    His
    0.07
     هو
    0.07
     Agree
    0.06
    ωσε
    0.06
    0.06
     SCP
    0.06
     văn
    0.06
    Act Density 0.424%

    No Known Activations