INDEX
    Explanations

    gender, women

    New Auto-Interp
    Negative Logits
     compute
    -0.09
    Nearest
    -0.07
    ]))
    -0.07
    Compute
    -0.07
    ρη
    -0.07
     Leb
    -0.07
    וש
    -0.07
    -0.07
    compute
    -0.07
     agony
    -0.07
    POSITIVE LOGITS
    女性
    0.19
     여성
    0.18
     女性
    0.18
     మహిళ
    0.18
     women
    0.18
     females
    0.18
     female
    0.17
    Women
    0.17
     महिलाओं
    0.17
     Women
    0.17
    Act Density 0.164%

    No Known Activations