Prevent compiler from optimizing out benchmark body.
Since the result of the Reduce call was not used, several compilers were omitting the call completely on release builds.
FWIW, the TBB implementation shows pretty solid scaling once this issue is fixed:
Speedup | Warn | serial | parallel | Benchmark (Type) |
---|---|---|---|---|
7.107 | 0.002597 +- 0.000185 | 0.000365 +- 0.000033 | Reduce on 2097152 values (vtkm::Float32) | |
3.039 | ! | 0.002816 +- 0.000130 | 0.000927 +- 0.000058 | Reduce on 2097152 values (vtkm::Float64) |
1.972 | !!! | 0.000648 +- 0.000039 | 0.000328 +- 0.000037 | Reduce on 2097152 values (vtkm::Int32) |
1.421 | !!! | 0.001321 +- 0.000049 | 0.000929 +- 0.000059 | Reduce on 2097152 values (vtkm::Int64) |
1.983 | !!! | 0.000650 +- 0.000037 | 0.000328 +- 0.000038 | Reduce on 2097152 values (vtkm::UInt32) |
3.170 | ! | 0.000112 +- 0.000016 | 0.000035 +- 0.000003 | Reduce on 2097152 values (vtkm::UInt8) |
2.148 | !! | 0.004162 +- 0.000135 | 0.001938 +- 0.000071 | Reduce on 2097152 values (vtkm::Vec< vtkm::Float32, 4 >) |
1.488 | !!! | 0.004316 +- 0.000133 | 0.002900 +- 0.000080 | Reduce on 2097152 values (vtkm::Vec< vtkm::Float64, 3 >) |
1.229 | !!! | 0.001159 +- 0.000051 | 0.000943 +- 0.000060 | Reduce on 2097152 values (vtkm::Vec< vtkm::Int32, 2 >) |
1.495 | !!! | 0.001556 +- 0.000123 | 0.001041 +- 0.000029 | Reduce on 2097152 values (vtkm::Vec< vtkm::UInt8, 4 >) |
Edited by Allison Vacanti