In an increasingly interconnected & cyber-physical world, complexity is often cited as the root cause of adverse project outcomes, including cost-overruns and schedule delays. This realization has prompted calls for better complexity management, which hinges on the ability to recognize and measure complexity early in the design process. However, while numerous complexity measures (CMs) have been promulgated, there is limited agreement about “how” complexity should be measured and what a good measure should entail. In this paper, we propose a framework for benchmarking CMs in terms of how well they are able to detect systematic variation along key aspects of complexity growth. Specifically, the literature is consistent in expecting that complexity growth is correlated with increases in size, number of interconnections, and randomness of the system architecture. Therefore, to neutrally compare six representative CMs, we synthetically create a set of system architectures that systematically vary across each dimension. We find that none of the measures are able to detect changes in all three dimensions simultaneously, though several are consistent in their response to one or two. We also find that there is a dichotomy in the literature regarding the archetype of systems that are considered as complex: CMs developed by researchers focused on physics-based (e.g., aircraft) tend to emphasize interconnectedness and structure whereas flow-based (e.g., the power grid) focus on size. Our findings emphasize the need for more careful validation across proposed measures. Our framework provides a path to enable shared progress towards the goal of better complexity management.