INDEX
Explanations
words related to gender and gender-neutral concepts
terms related to gender inclusivity and related social issues
New Auto-Interp
Negative Logits
hiba
-0.75
Bus
-0.73
Marriott
-0.70
shire
-0.69
KC
-0.69
Js
-0.67
dry
-0.66
Niet
-0.64
akura
-0.64
Reviewer
-0.63
POSITIVE LOGITS
ethnicity
0.83
pronouns
0.80
genders
0.78
inheritance
0.74
identifiers
0.73
influences
0.73
minorities
0.72
ensl
0.71
chromosomes
0.71
itives
0.70
Activations Density 0.217%