INDEX
    Explanations

    adjectives related to gender characteristics

    terms related to gender identity and expressions of femininity and masculinity

    New Auto-Interp
    Negative Logits
    Assembly
    -0.82
    oard
    -0.81
    undo
    -0.79
    RAY
    -0.72
     Redemption
    -0.71
    oulos
    -0.70
    Grant
    -0.66
    Stone
    -0.64
    owitz
    -0.63
    rave
    -0.62
    POSITIVE LOGITS
    istries
    0.89
    inity
    0.86
     feminine
    0.78
     masculine
    0.77
     fem
    0.77
    女
    0.77
     hygiene
    0.75
    xual
    0.74
     pronouns
    0.72
    inant
    0.68
    Act Density 0.044%

    No Known Activations