INDEX
Explanations
terms related to gender, including gender-neutral or gender-based concepts
phrases related to gender and related policies
New Auto-Interp
Negative Logits
shire
-0.74
hiba
-0.72
×ķ
-0.70
hower
-0.65
KC
-0.64
Bus
-0.64
deck
-0.64
Cheong
-0.64
ש
-0.64
NK
-0.64
POSITIVE LOGITS
itives
0.84
ethnicity
0.76
inheritance
0.75
ethnic
0.72
influences
0.71
Hispanic
0.71
minorities
0.70
ancestry
0.70
pronouns
0.69
representation
0.69
Activations Density 0.270%