Performance Regression testing automation: Design
Overview
The design of the performance regression test is composed of the following components:
- The Kitware Gitlab instance which trigger the benchmark jobs when pushes to master and manually in merge requests.
- Gitlab CI jobs for performing the benchmarks and for generating the statistics and plots.
- The Kitware CDASH instance which stores the benchmark measurements of each master commit
The performance regression test is performed when the user triggers the manual job benchtest which invoke the benchmark suite in a Gitlab runner job and later compare its results against the historical results, stored in CDASH, of its most immediate master ancestor. The results of this comparison are then displayed in a brief report in the form of a comment in its corresponding Gitlab merge-request.
Details
Selection of Benchmarks
While we can possibly run all of the provided benchmarks in the continuous build track to avoid potential performance and latency issues in the CI, I propose to initially limit the benchmark suites to be run to solely:
- BenchmarkFilters
- BenchmarkDeviceAdapters
Benchmark ctest
We must provide a CMake function named add_benchmark_test
which runs the benchmark as a CTEST and uploads its corresponding output report to CDASH. The CMake function should solely take the name of the Benchmark target.
CMake options to enable this
- Benchmark tests will be enabled in a CMAKE build that sets both
VTKm_ENABLE_BENCHMARKS
andVTKm_ENABLE_TESTING
.
New Gitlab Runner requirements
- It must have disabled every type of CPU scaling option both at the BIOS and Kernel level (
cpugovern
) - It must provide a gitlab runner with a concurrency level 1 to avoid other jobs being scheduled while the benchmark is being executed.
Gitlab job for running the benchmarks
The job should install Google Benchmarks in release mode and the appropriate options to perform benchmarks. After running the benchmarks it must upload its results to the Kitware CDASH instance.
Gitlab job for generating the statistics and plots
The job must use an docker image which provides a Python 3 distribution for compare.py
script and possibly the R Tidy-verse stack to generate the plots and further statistics.
The job must write its output in the requested merge-request in the following format:
In the first field we display the Google Benchmark compare.py
script raw output of a few selected benchmarks. For this purpose I propose to solely displays the output of these following benchmarks.
- Contour
- Gradient
- Threshold
- BenchSort
Below is an example of this raw output:
Tasklists
-
Create the CTEST that runs the desired benchmarks and captures its output file -
Create Gitlab CI job that runs the specified benchmarks and upload it to CDASH -
Create Gitlab CI job that generates the stats (using Gbench compare.py
) and also generate plots to facilitate the user interpret its results.