INDEX
    Explanations

    identifying male individuals

    New Auto-Interp
    Negative Logits
     ಹೊಂದಿ
    0.43
     WOMEN
    0.42
    0.42
    women
    0.41
    0.40
     women
    0.38
     menstrual
    0.38
     زنان
    0.38
     amén
    0.38
     Jij
    0.38
    POSITIVE LOGITS
     guy
    3.73
     guys
    3.48
    Guy
    3.39
     Guy
    3.38
    guy
    3.27
    guys
    3.22
     Guys
    3.16
    Guys
    3.03
    家伙
    2.17
     dudes
    2.13
    Act Density 0.020%

    No Known Activations