INDEX
    Explanations

    references to gender roles and stereotypes related to masculinity

    New Auto-Interp
    Negative Logits
    ,strlen
    -0.14
    igin
    -0.14
    ]={↵
    -0.13
    .enumer
    -0.13
    ÙĦس
    -0.13
     Türkçe
    -0.13
    _banner
    -0.13
    itech
    -0.13
    anvas
    -0.13
     Shield
    -0.13
    POSITIVE LOGITS
     society
    0.31
     norms
    0.27
     expectations
    0.27
     pressure
    0.26
     conformity
    0.26
     Pressure
    0.25
     societal
    0.24
    norm
    0.24
     Society
    0.24
     expectation
    0.23
    Act Density 0.173%

    No Known Activations