INDEX
    Explanations

    adjectives associated with gender characteristics, particularly masculine and feminine

    references to gender roles, specifically masculine and feminine attributes

    New Auto-Interp
    Negative Logits
    Assembly
    -0.86
    Deal
    -0.77
    Reviewer
    -0.75
     Tour
    -0.73
    EV
    -0.72
    oulos
    -0.69
    eding
    -0.68
    Reviewed
    -0.68
    EVA
    -0.67
    Publisher
    -0.67
    POSITIVE LOGITS
     masculinity
    1.15
     masculine
    1.14
     feminine
    1.00
     mascul
    0.99
     femin
    0.87
    istries
    0.86
     citiz
    0.83
    femin
    0.82
    WithNo
    0.80
     submar
    0.80
    Act Density 0.015%

    No Known Activations