INDEX
Explanations
terms related to gender or race issues and their intersections with discrimination
New Auto-Interp
Negative Logits
eseorang
-0.42
ddelweddau
-0.42
хьтан
-0.41
Ecotoxicity
-0.40
borboleta
-0.40
ainville
-0.39
OwnerId
-0.39
schop
-0.39
nė
-0.38
MethodManager
-0.38
POSITIVE LOGITS
equality
0.49
composition
0.45
relations
0.44
identity
0.43
disparity
0.43
stratification
0.43
bender
0.43
differences
0.42
principalColumn
0.41
identities
0.40
Activations Density 0.288%