INDEX
Explanations
instances of systemic bias, particularly in the context of gender inequality and institutional practices
New Auto-Interp
Negative Logits
747
-0.14
ogan
-0.14
anders
-0.14
921
-0.14
ÑĤÑĢон
-0.13
raud
-0.13
onga
-0.13
920
-0.13
leigh
-0.12
ias
-0.12
POSITIVE LOGITS
across
0.39
everywhere
0.31
both
0.31
Across
0.27
Across
0.26
wherever
0.26
both
0.25
throughout
0.24
ranging
0.23
både
0.23
Activations Density 0.358%