INDEX
Explanations
terms related to women's issues and representation
New Auto-Interp
Negative Logits
(es
-0.17
gaard
-0.17
ses
-0.17
eson
-0.16
leton
-0.15
istol
-0.15
vier
-0.15
bred
-0.15
iom
-0.14
ars
-0.14
POSITIVE LOGITS
ifest
0.18
endez
0.17
opause
0.17
ÏĤ
0.16
-child
0.16
hood
0.16
/people
0.16
uele
0.15
ized
0.15
astery
0.15
Activations Density 0.057%