INDEX
    Explanations

    terms related to gender, including gender-neutral or gender-based concepts

    phrases related to gender and related policies

    New Auto-Interp
    Negative Logits
    shire
    -0.74
    hiba
    -0.72
    ×ķ
    -0.70
    hower
    -0.65
    KC
    -0.64
    Bus
    -0.64
    deck
    -0.64
     Cheong
    -0.64
    ש
    -0.64
    NK
    -0.64
    POSITIVE LOGITS
    itives
    0.84
     ethnicity
    0.76
     inheritance
    0.75
    ethnic
    0.72
     influences
    0.71
     Hispanic
    0.71
     minorities
    0.70
     ancestry
    0.70
     pronouns
    0.69
     representation
    0.69
    Act Density 0.270%

    No Known Activations