Add performance Regression test
Prototype for proposal #690 (closed) which adds an automized Performance Regression test for VTK-m
In this MR we implement the following changes:
- Creates a Performance test
- Uploads the results to CDASH
- Uploads historical result to a git repository.
- Adds a new build that performs the performance test
- Historical results can be checked in the output of the build
Automated Performance Regression tests
Overview
The design of the performance regression test is composed of the following components:
- The Kitware Gitlab instance which trigger the benchmark jobs when a git commit is pushed.
- Gitlab CI jobs for performing the benchmarks and for generating the comparison with the historical results.
- A Git repository that is used for storing the historical results.
- The Kitware CDASH instance which files, displays the performance report and inform the developer if a performance regression has occurred.
The performance regression test is performed whenever a git commit is pushed. The job performancetest which invoke the benchmark suite in a Gitlab runner job and later compare its results against the historical results, stored in CDASH, of its most immediate master ancestor. The results of this comparison are then displayed in a brief report in the form of a comment in its corresponding Gitlab merge-request.
Details
Selection of Benchmarks
While we can possibly run all of the provided benchmarks in the continuous build track to avoid potential performance and latency issues in the CI, I have initially limited the benchmark suites to:
- BenchmarkFilters
- BenchmarkInsitu
Benchmark ctest
We provide a CMake function named add_benchmark_test
which sets the
performance regression test for the given Google Benchmark suite. It admits one
argument to filter the number of benchmarks to be executed. If ran locally, it
will not upload the results to the online record repository.
Requirements
- Python3 with the SciPy package
- Benchmark tests will be enabled in a CMAKE build that sets both
VTKm_ENABLE_BENCHMARKS
andVTKm_ENABLE_PERFORMANCE_TESTING
New Gitlab Runner requirements
- It must have disabled every type of CPU scaling option both at the BIOS and
Kernel level (
cpugovern
). - It must provide a gitlab runner with a concurrency level 1 to avoid other jobs being scheduled while the benchmark is being executed.
How to make sense of the results
Results of both of the benchmark and the comparison against its most recent commit ancestor can be accessed in the CDASH Notes for the performance regression build. The CDASH Notes can be accessed by clicking the note-like miniature image in the build name column.
Performance Regressions test that report a performance failure are reported in
the form of a test failure of the test PerformanceTest($TestName)Report
. The
results of the comparison can be seen by clicking this failed test.
Performance regression test success is determined by the performance of a null hypothesis test with the hypothesis that the given tests performs similarly or better than the baseline test with a confidence level 1-alpha. If a pvalue is small enough (less than alpha), we reject the null hypothesis and we report that the current commit introduces a performance regression. By default we use a t-distribution with an alpha value of 5%. The pvalues can be seen in the uploaded reports.
The following parameters can be modified by editing the corresponding environmental variables:
- Alpha value:
VTKm_PERF_ALPHA
- Minimum number of repetitions for each benchmark:
VTKm_PERF_REPETITIONS
- Minimum time to spend for each benchmark:
VTKm_PERF_MIN_TIME
- Statistical distribution to use:
VTKm_PERF_DIST
Below is an example of this raw output of the comparison of the current commit against the baseline results:
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------
BenchThreshold/manual_time +0.0043 +0.0036 73 73 92 92
BenchThreshold/manual_time +0.0074 +0.0060 73 73 91 92
BenchThreshold/manual_time -0.0003 -0.0007 73 73 92 92
BenchThreshold/manual_time -0.0019 -0.0018 73 73 92 92
BenchThreshold/manual_time -0.0021 -0.0017 73 73 92 92
BenchThreshold/manual_time +0.0001 +0.0006 73 73 92 92
BenchThreshold/manual_time +0.0033 +0.0031 73 73 92 92
BenchThreshold/manual_time -0.0071 -0.0057 73 73 92 92
BenchThreshold/manual_time -0.0050 -0.0041 73 73 92 92