INDEX
Explanations
references to anti-discrimination policies and practices
New Auto-Interp
Negative Logits
racially
-0.19
racist
-0.18
racism
-0.18
Rac
-0.18
racial
-0.17
_tD
-0.15
eg
-0.15
assi
-0.15
racing
-0.14
oltip
-0.14
POSITIVE LOGITS
veteran
0.28
pregnancy
0.24
Veteran
0.22
veter
0.22
creed
0.22
veterans
0.21
preg
0.20
protected
0.20
Protected
0.20
disability
0.20
Activations Density 0.017%