INDEX
    Explanations

    gender-related words and phrases, including concepts of male dominance, female submission, and gender roles

    New Auto-Interp
    Negative Logits
     makro
    -0.78
     kaos
    -0.73
     aton
    -0.69
     kram
    -0.67
     lele
    -0.66
     fortn
    -0.66
     teras
    -0.65
     saba
    -0.65
     usta
    -0.65
     saar
    -0.65
    POSITIVE LOGITS
     husbands
    0.56
     women
    0.54
    توضیحات
    0.52
     husband
    0.52
     wives
    0.51
     Mulher
    0.50
     homemaker
    0.50
     herself
    0.50
    obiety
    0.49
    kaufs
    0.48
    Act Density 0.413%

    No Known Activations