INDEX
    Explanations

    references to gender roles and reproductive systems

    New Auto-Interp
    Negative Logits
    脚注の使い方
    -0.81
    UnsafeEnabled
    -0.78
    MLLoader
    -0.71
    DockStyle
    -0.71
     оригіналу
    -0.68
     snippetHide
    -0.64
     bağlantılar
    -0.59
    BORN
    -0.59
    born
    -0.58
    CommonModule
    -0.57
    POSITIVE LOGITS
     women
    1.03
     WOMEN
    0.92
    WOMEN
    0.91
     Women
    0.90
     gentlemen
    0.87
     masculine
    0.85
    Women
    0.85
     masculinity
    0.84
     ladies
    0.84
     mascul
    0.83
    Act Density 0.355%

    No Known Activations