INDEX
    Explanations

    cultural references and terms related to social roles and status

    New Auto-Interp
    Negative Logits
     kvinnor
    -0.70
     kvinder
    -0.70
     vrouwen
    -0.69
     ženy
    -0.66
    Women
    -0.66
    women
    -0.64
    Girls
    -0.63
     féminine
    -0.63
     women
    -0.62
     женщин
    -0.62
    POSITIVE LOGITS
     spin
    0.53
     dow
    0.48
     courtes
    0.46
    spin
    0.44
     maiden
    0.44
     seam
    0.44
     Dow
    0.42
     ancest
    0.40
     Spin
    0.40
     روس
    0.39
    Act Density 0.452%

    No Known Activations