INDEX
    Explanations

    words related to specific groups of people, such as "People", "Men", and "Women"

    references to different groups of people, specifically highlighting gender and identity

    New Auto-Interp
    Negative Logits
     dads
    -0.69
    enegger
    -0.67
     husbands
    -0.64
     principals
    -0.64
     whipping
    -0.63
     fathers
    -0.62
     backbone
    -0.62
     parents
    -0.61
     stump
    -0.60
     pedoph
    -0.60
    POSITIVE LOGITS
    ysc
    0.82
    MpServer
    0.81
    ettlement
    0.77
     Eater
    0.75
    rights
    0.75
     Ago
    0.75
    ascript
    0.73
    Soft
    0.72
     Killed
    0.72
     Own
    0.72
    Act Density 0.094%

    No Known Activations