FR: first-class benchmark support

I propose that benchmarks should gain first-class support in CMake, being treated similarly to tests. CTest would be used to run benchmarks, and CDash would be used to view benchmark results.

Rationale and motivation

Benchmarking is an important part of software development, particularly for high-performance and realtime applications. Running benchmarks using CMake/CTest's current capabilities is possible, but usually requires implementing some custom scripting logic. Currently, there are two primary ways to add and run benchmarks with CMake:

Add a custom target that will execute your benchmarks
Add a CTest test that will execute your benchmarks

In either case, the programmer's intent could be more clearly expressed if benchmarks were first-class entities in CMake. CMake can also provide some functionality to better support viewing benchmark results and comparing results to previous results to detect regressions.

If CMake/CTest provides first-class support for benchmarks, I believe this would encourage more projects to integrate benchmarks into their workflows, and would make it much easier for projects currently implementing custom benchmarking logic to eliminate custom code and instead use CMake-provided facilities.

Requirements

I think there are two primary kinds of benchmarks:

Benchmarks that must hit a certain absolute performance threshold
Benchmarks that are more interested in detecting performance regressions

In either case, the programmer should specify benchmarks in their CMakeLists.txt files in a similar way to how tests are specified, and the programmer should run ctest to execute benchmarks and receive a report of which benchmarks passed or failed. CDash should display benchmark results, and perhaps even a historical graph of benchmark results.

For the first kind of benchmark, CTest should report a failure if the benchmark's execution time is greater than the specified absolute time threshold.

For the second kind of benchmark, CTest should run the benchmark and compare the results to a historical record of results for this benchmark, and report a failure if there is a statistically significant increase in execution time. Questions for implementation include:

how to store this historical data for comparison
how CTest should determine if a regression is statistically significant

Some benchmarks may fall into both categories: the programmer may require the benchmark to always pass an absolute time threshold, and also to not regress in performance over time.

Proposed changes

I propose a new CMake command, called add_benchmark. In order to support either kind of benchmark mentioned above, I propose the following signature:

add_benchmark (NAME <name>
  COMMAND <command>
  [ABSOLUTE <ms>] [REGRESSION]
  [CONFIGURATIONS <config...>]
  [WORKING_DIRECTORY <dir>]
  [COMMAND_EXPAND_LISTS])

All of the arguments would have the same meaning & semantics as the add_test command, except for the new ABSOLUTE and REGRESSION options. REGRESSION means that CTest will check the results against the historical data, and ABSOLUTE means that CTest will report a failure if the benchmark's execution time is greater than the specified number of milliseconds. Both options may be specified, and if neither is specified, REGRESSION would be the default.

Benchmarks would have all the same properties as tests. New properties for benchmarks would include:

ABSOLUTE_TIME: number of milliseconds the benchmark is required to complete in. Empty/unset for benchmarks with no such requirement.
REGRESSION: boolean indicating whether this benchmark is regression-tested against previous results.

There could also be some properties for controlling how exactly CTest determines if a statistically significant regression occurred.

I also propose that the CTest dashboard client gain a new step called Benchmark, to be executed after the Test step. This step could simply be skipped if the project specifies no benchmarks. Only tests should be executed during the Test step.

One implementation question is the exact method for CTest to retrieve timing information from the benchmark. CTest could define a spec for outputting this information from the benchmark's standard output, similar to the "additional test measurements" feature currently available. If the benchmark fails to print out more detailed information, CTest would simply use the entire execution time of the benchmark command.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information