INDEX
Explanations
references to historical firsts related to gender and race in leadership positions
New Auto-Interp
Negative Logits
lán
-0.18
istar
-0.17
imal
-0.17
hei
-0.15
Unary
-0.15
Walton
-0.14
Aux
-0.14
gesi
-0.14
bem
-0.14
TRS
-0.14
POSITIVE LOGITS
ever
0.15
-ever
0.14
ever
0.14
owie
0.14
points
0.14
IDGET
0.14
aldo
0.14
تÙĪÙĨ
0.14
Points
0.13
elly
0.13
Activations Density 0.032%