INDEX
    Explanations

    gendered references, specifically focusing on the mention of men and women

    mentions of women and their respective contexts in various topics

    New Auto-Interp
    Negative Logits
    imum
    -0.75
    ËĪ
    -0.68
    Dur
    -0.67
    MN
    -0.66
    Coun
    -0.66
    Minnesota
    -0.66
    San
    -0.65
     Arbor
    -0.65
     Rack
    -0.65
    Gi
    -0.64
    POSITIVE LOGITS
     alike
    1.53
     respectively
    1.01
     striped
    0.98
     combatants
    0.92
     faiths
    0.81
     halves
    0.77
     separated
    0.73
     conver
    0.71
     exchanged
    0.69
     separately
    0.68
    Act Density 0.149%

    No Known Activations