INDEX
    Explanations

    references to women and their roles or characteristics

    New Auto-Interp
    Negative Logits
    eing
    -0.16
    gaard
    -0.15
    Ìģ
    -0.15
    ذÙĩ
    -0.14
    leton
    -0.14
    posix
    -0.14
    grund
    -0.14
    emales
    -0.14
     Fi
    -0.13
    antar
    -0.13
    POSITIVE LOGITS
    hood
    0.17
    Sharper
    0.15
    ityEngine
    0.14
    oi
    0.14
    AMI
    0.14
    lok
    0.14
    omaly
    0.14
    elijke
    0.14
    XP
    0.14
    agers
    0.14
    Act Density 0.040%

    No Known Activations