INDEX
    Explanations

    terms related to gender, including its equality, identity, and roles in society

    New Auto-Interp
    Negative Logits
    vip
    -0.16
    .metamodel
    -0.15
    sher
    -0.15
    _RING
    -0.15
    rish
    -0.14
    yling
    -0.14
    lashes
    -0.14
    ãĤ¯ãĤ»
    -0.14
    yan
    -0.14
    signature
    -0.14
    POSITIVE LOGITS
    ed
    0.37
     roles
    0.28
    que
    0.27
    -neutral
    0.26
     neutral
    0.26
    fluid
    0.25
    edn
    0.24
    neutral
    0.23
    -fluid
    0.23
     Roles
    0.22
    Act Density 0.013%

    No Known Activations