INDEX
Explanations
gendered references, specifically focusing on the mention of men and women
mentions of women and their respective contexts in various topics
New Auto-Interp
Negative Logits
imum
-0.75
ËĪ
-0.68
Dur
-0.67
MN
-0.66
Coun
-0.66
Minnesota
-0.66
San
-0.65
Arbor
-0.65
Rack
-0.65
Gi
-0.64
POSITIVE LOGITS
alike
1.53
respectively
1.01
striped
0.98
combatants
0.92
faiths
0.81
halves
0.77
separated
0.73
conver
0.71
exchanged
0.69
separately
0.68
Activations Density 0.149%