INDEX
    Explanations

    phrases related to societal norms and behavior

    negative assertions about gender dynamics and societal roles

    New Auto-Interp
    Negative Logits
    iHUD
    -0.76
    Chain
    -0.69
    hent
    -0.66
     predecessor
    -0.65
    éĹ
    -0.64
    someone
    -0.64
    client
    -0.62
    ainment
    -0.62
    plate
    -0.61
    holder
    -0.61
    POSITIVE LOGITS
     outnumbered
    1.07
     disproportionately
    1.05
    joice
    0.96
     rejoice
    0.90
     overwhelmingly
    0.89
     flock
    0.87
     dominate
    0.86
     smarter
    0.84
     behave
    0.84
     enslaved
    0.84
    Act Density 0.621%

    No Known Activations