INDEX
    Explanations

    instances of gender representation and biases in various contexts

    New Auto-Interp
    Negative Logits
    ikh
    -0.17
    anse
    -0.16
    oron
    -0.16
    utut
    -0.15
    iverz
    -0.15
    regn
    -0.15
    تÙĪÙĨ
    -0.14
    lover
    -0.14
    inton
    -0.14
    ushman
    -0.14
    POSITIVE LOGITS
     female
    0.48
     male
    0.44
     females
    0.41
     Female
    0.39
     gender
    0.38
    male
    0.37
    female
    0.36
     males
    0.36
    女æĢ§
    0.35
     women
    0.35
    Act Density 0.158%

    No Known Activations