INDEX
Explanations
book recommendations based on gender stereotypes
phrases related to education and gender disparities
New Auto-Interp
Negative Logits
Footnote
-0.71
Tier
-0.68
76561
-0.68
ADRA
-0.66
everal
-0.66
Register
-0.65
Save
-0.65
Site
-0.65
especially
-0.65
idan
-0.64
POSITIVE LOGITS
non
0.74
theirs
0.73
nons
0.73
wine
0.72
automobiles
0.70
ours
0.70
rapes
0.70
hers
0.69
swords
0.69
guns
0.68
Activations Density 0.395%