Compare and Swap (CAS) Sequence for BenchmarkAtomicArray Takes too long to Iterate
When running the CASSeq BenchmarkAtomicArray test the amount of time it takes to complete a single iteration is ~10 minutes for single value cases. This is a sign that the benchmark might be inefficiently implemented to correctly benchmark the functionality that the benchmark should be stressing. This benchmark should be refactored to take less time per iteration and ensure that the correct features are benchmarked.
The following benchmark results were produced on the sandia gpu cluster with the following config and command:
cuda 10.2, gcc 7.3.0, kokkos 3.4.01
./compare-benchmarks.py --benchmark1='../../build/bin/BenchmarkAtomicArray --vtkm-device Cuda' --benchmark2='../../build/bin/BenchmarkAtomicArray --vtkm-device Kokkos' -- benchmarks
Cuda results:
BenchCASSeq<unsigned int>/AtomicsValues:1/AtomicOps:33554432/manual_time 616023 ms 610317 ms 1 217
.878k/s
BenchCASSeq<unsigned int>/AtomicsValues:8/AtomicOps:33554432/manual_time 72146 ms 71492 ms 1 1.8
6035M/s
BenchCASSeq<unsigned int>/AtomicsValues:64/AtomicOps:33554432/manual_time 10449 ms 10367 ms 1 12.
8444M/s
BenchCASSeq<unsigned int>/AtomicsValues:512/AtomicOps:33554432/manual_time 256 ms 255 ms 3 524
.776M/s
BenchCASSeq<unsigned int>/AtomicsValues:4096/AtomicOps:33554432/manual_time 8.20 ms 8.22 ms 85 16.
3659G/s
BenchCASSeq<unsigned int>/AtomicsValues:32768/AtomicOps:33554432/manual_time 1.14 ms 1.15 ms 613 117
.789G/s
BenchCASSeq<unsigned int>/AtomicsValues:262144/AtomicOps:33554432/manual_time 0.452 ms 0.461 ms 1551 297
.178G/s
BenchCASSeq<unsigned int>/AtomicsValues:1048576/AtomicOps:33554432/manual_time 0.452 ms 0.461 ms 1544 297
Edited by Nickolas Davis