INDEX
    Explanations

    references to women or females in various contexts, such as combat roles, management positions, cosmetic surgery, and political preferences

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.89
    UFF
    -0.81
    -+-+
    -0.76
    RAY
    -0.75
    ebus
    -0.74
    REC
    -0.72
    EMA
    -0.71
    æĸ¹
    -0.71
    rador
    -0.70
    eme
    -0.70
    POSITIVE LOGITS
    folk
    1.24
     empowerment
    1.03
     genital
    0.99
    opausal
    0.94
     breasts
    0.93
     menstru
    0.92
    hood
    0.87
     contraceptive
    0.87
     reproductive
    0.86
    volent
    0.83
    Act Density 0.071%

    No Known Activations