INDEX
Explanations
terms related to gender
mentions of gender and related disparities or issues
New Auto-Interp
Negative Logits
BLIC
-0.76
Warrant
-0.72
GOODMAN
-0.69
Gerr
-0.69
Showtime
-0.68
steen
-0.68
Emanuel
-0.66
Brees
-0.66
amina
-0.66
Parish
-0.65
POSITIVE LOGITS
dysph
1.11
pronouns
0.89
bender
0.89
equality
0.89
endered
0.86
Equality
0.86
fuck
0.84
genders
0.84
puter
0.81
imbalance
0.81
Activations Density 0.016%