INDEX
    Explanations

    references to men or male-related terms

    New Auto-Interp
    Negative Logits
    erc
    -0.15
     FactoryBot
    -0.15
    gart
    -0.15
    Ī
    -0.15
     (::
    -0.15
    ighton
    -0.15
    er
    -0.15
    oslav
    -0.14
    (es
    -0.14
    лиÑĨ
    -0.14
    POSITIVE LOGITS
    opause
    0.28
    cken
    0.25
    endez
    0.24
    iscal
    0.23
    ager
    0.23
    aced
    0.22
    acing
    0.22
    едж
    0.21
    aces
    0.21
    ubar
    0.21
    Act Density 0.021%

    No Known Activations