INDEX
    Explanations

    references to gender dynamics and roles within societal contexts

    New Auto-Interp
    Negative Logits
    ."+
    -0.56
    otomy
    -0.55
    pherals
    -0.55
    [])
    
    -0.55
    RIAGE
    -0.54
    StoreMessageInfo
    -0.54
    NUMX
    -0.54
    ."));
    -0.53
    ifrance
    -0.52
    irov
    -0.52
    POSITIVE LOGITS
     huh
    1.22
     eh
    0.96
    Isn
    0.92
     isn
    0.91
     Isn
    0.87
     aren
    0.84
    isn
    0.82
     Wasn
    0.80
     prawda
    0.80
    Wasn
    0.79
    Act Density 0.218%

    No Known Activations