INDEX
    Explanations

    references to women and female-related terms in various contexts

    New Auto-Interp
    Negative Logits
    .googleapis
    -0.16
    nik
    -0.15
    cup
    -0.15
    aro
    -0.15
    uem
    -0.15
    uels
    -0.14
    auc
    -0.14
     dirty
    -0.14
    aja
    -0.14
    еÑĢин
    -0.14
    POSITIVE LOGITS
    Äįi
    0.15
    elts
    0.15
    rial
    0.15
    rary
    0.14
    quarters
    0.14
    cott
    0.13
    ears
    0.13
    ÙĨدا
    0.13
    roupon
    0.13
     alike
    0.13
    Act Density 0.021%

    No Known Activations