INDEX
Explanations
phrases related to social issues, especially outdated social norms and gender biases
references to societal norms and the effects of gender bias
New Auto-Interp
Negative Logits
endment
-0.87
etsk
-0.84
ospons
-0.83
hire
-0.81
amen
-0.79
ppa
-0.76
osponsors
-0.73
hots
-0.72
Schn
-0.72
earch
-0.71
POSITIVE LOGITS
stereotypes
1.74
prejudices
1.67
notions
1.66
precon
1.65
assumptions
1.61
norms
1.49
stereotype
1.47
beliefs
1.45
biases
1.40
myths
1.39
Activations Density 0.408%