INDEX
    Explanations

    references to gender, particularly the presence of men and women in various contexts

    New Auto-Interp
    Negative Logits
    нÑĮ
    -0.15
    utan
    -0.15
    ollipop
    -0.15
    VIC
    -0.15
    oron
    -0.15
    lesi
    -0.14
    Male
    -0.14
    дÑĢом
    -0.14
    icina
    -0.14
    ritis
    -0.14
    POSITIVE LOGITS
     women
    0.59
     woman
    0.54
    women
    0.49
     Women
    0.43
    Women
    0.42
     WOM
    0.42
    女人
    0.41
    woman
    0.41
     Woman
    0.40
     mujeres
    0.38
    Act Density 0.036%

    No Known Activations