INDEX
    Explanations

    references to men and male-related terms

    New Auto-Interp
    Negative Logits
     FactoryBot
    -0.17
    Sad
    -0.17
    ernote
    -0.16
    imler
    -0.15
    atik
    -0.15
    piring
    -0.15
    AML
    -0.14
    lassen
    -0.14
     Ley
    -0.14
    rát
    -0.14
    POSITIVE LOGITS
    opause
    0.32
    endez
    0.26
    orca
    0.25
    ager
    0.23
    ubar
    0.23
    cken
    0.23
    udo
    0.23
    elik
    0.22
    ninger
    0.22
    acing
    0.22
    Act Density 0.014%

    No Known Activations