Add specialization of tbb reduce by key.
This still gets outperformed by the (extremely efficient) serial algorithm in certain cases, but is an order of magnitude faster than the generic implementation TBB was using. More than 4 cores may be needed to see sufficient parallel speedup that would overcome the TBB overhead, and grain size does not seem to affect the performance significantly.
Compared to serial:
Speedup | Warn | serial | parallel | Benchmark (Type) |
---|---|---|---|---|
1.243 | !!! | 0.002572 +- 0.000070 | 0.002069 +- 0.000048 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Float32) |
1.045 | !!! | 0.002940 +- 0.000056 | 0.002814 +- 0.000053 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Float64) |
1.233 | !!! | 0.002657 +- 0.000043 | 0.002155 +- 0.000142 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Int32) |
1.092 | !!! | 0.003037 +- 0.000075 | 0.002780 +- 0.000052 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Int64) |
1.059 | !!! | 0.002318 +- 0.000036 | 0.002189 +- 0.000106 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::UInt32) |
1.621 | !!! | 0.002511 +- 0.000047 | 0.001549 +- 0.000041 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::UInt8) |
0.978 | !!!! | 0.004044 +- 0.000081 | 0.004133 +- 0.000085 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.849 | !!!! | 0.004686 +- 0.000071 | 0.005520 +- 0.000112 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
1.241 | !!! | 0.003457 +- 0.000135 | 0.002786 +- 0.000048 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
1.077 | !!! | 0.003337 +- 0.000108 | 0.003098 +- 0.000064 | ReduceByKey on 2097152 values with 104857 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |
1.066 | !!! | 0.002799 +- 0.000077 | 0.002627 +- 0.000063 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Float32) |
0.895 | !!!! | 0.003179 +- 0.000066 | 0.003551 +- 0.000086 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Float64) |
1.066 | !!! | 0.002812 +- 0.000080 | 0.002637 +- 0.000071 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Int32) |
0.852 | !!!! | 0.003021 +- 0.000047 | 0.003546 +- 0.000091 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Int64) |
0.869 | !!!! | 0.002277 +- 0.000063 | 0.002620 +- 0.000066 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::UInt32) |
1.372 | !!! | 0.002655 +- 0.000082 | 0.001935 +- 0.000043 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::UInt8) |
0.796 | !!!! | 0.004225 +- 0.000082 | 0.005308 +- 0.000130 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.705 | !!!! | 0.005007 +- 0.000076 | 0.007099 +- 0.000173 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
0.972 | !!!! | 0.003443 +- 0.000089 | 0.003543 +- 0.000085 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
1.026 | !!! | 0.003412 +- 0.000093 | 0.003325 +- 0.000057 | ReduceByKey on 2097152 values with 209715 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |
0.925 | !!!! | 0.002979 +- 0.000082 | 0.003219 +- 0.000095 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Float32) |
0.752 | !!!! | 0.003277 +- 0.000084 | 0.004360 +- 0.000121 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Float64) |
0.929 | !!!! | 0.003004 +- 0.000080 | 0.003235 +- 0.000091 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Int32) |
0.703 | !!!! | 0.003054 +- 0.000045 | 0.004346 +- 0.000113 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Int64) |
0.744 | !!!! | 0.002406 +- 0.000046 | 0.003234 +- 0.000092 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::UInt32) |
1.200 | !!! | 0.002848 +- 0.000077 | 0.002373 +- 0.000070 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::UInt8) |
0.685 | !!!! | 0.004510 +- 0.000082 | 0.006585 +- 0.000177 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.608 | !!!! | 0.005361 +- 0.000092 | 0.008815 +- 0.000283 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
0.800 | !!!! | 0.003527 +- 0.000081 | 0.004407 +- 0.000130 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
0.942 | !!!! | 0.003568 +- 0.000087 | 0.003787 +- 0.000087 | ReduceByKey on 2097152 values with 314572 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |
0.852 | !!!! | 0.003208 +- 0.000085 | 0.003767 +- 0.000107 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Float32) |
0.688 | !!!! | 0.003515 +- 0.000078 | 0.005108 +- 0.000146 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Float64) |
0.846 | !!!! | 0.003255 +- 0.000117 | 0.003849 +- 0.000194 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Int32) |
0.654 | !!!! | 0.003378 +- 0.000199 | 0.005163 +- 0.000181 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Int64) |
0.692 | !!!! | 0.002599 +- 0.000067 | 0.003756 +- 0.000113 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::UInt32) |
1.109 | !!! | 0.003064 +- 0.000083 | 0.002762 +- 0.000089 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::UInt8) |
0.621 | !!!! | 0.004787 +- 0.000088 | 0.007706 +- 0.000245 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.555 | !!!! | 0.005722 +- 0.000070 | 0.010315 +- 0.000339 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
0.705 | !!!! | 0.003584 +- 0.000079 | 0.005085 +- 0.000142 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
0.913 | !!!! | 0.003775 +- 0.000096 | 0.004136 +- 0.000107 | ReduceByKey on 2097152 values with 419430 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |
0.782 | !!!! | 0.003407 +- 0.000084 | 0.004356 +- 0.000131 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Float32) |
0.618 | !!!! | 0.003693 +- 0.000100 | 0.005971 +- 0.000189 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Float64) |
0.791 | !!!! | 0.003431 +- 0.000092 | 0.004337 +- 0.000125 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Int32) |
0.597 | !!!! | 0.003535 +- 0.000057 | 0.005919 +- 0.000181 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Int64) |
0.640 | !!!! | 0.002791 +- 0.000079 | 0.004361 +- 0.000134 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::UInt32) |
1.038 | !!! | 0.003293 +- 0.000092 | 0.003173 +- 0.000099 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::UInt8) |
0.565 | !!!! | 0.005061 +- 0.000090 | 0.008950 +- 0.000279 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.509 | !!!! | 0.006071 +- 0.000069 | 0.011929 +- 0.000387 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
0.621 | !!!! | 0.003702 +- 0.000080 | 0.005962 +- 0.000179 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
0.866 | !!!! | 0.003967 +- 0.000104 | 0.004581 +- 0.000134 | ReduceByKey on 2097152 values with 524288 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |
0.714 | !!!! | 0.003613 +- 0.000104 | 0.005061 +- 0.000150 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Float32) |
0.575 | !!!! | 0.003951 +- 0.000088 | 0.006869 +- 0.000218 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Float64) |
0.726 | !!!! | 0.003648 +- 0.000096 | 0.005022 +- 0.000164 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Int32) |
0.515 | !!!! | 0.003568 +- 0.000063 | 0.006929 +- 0.000279 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Int64) |
0.548 | !!!! | 0.002769 +- 0.000071 | 0.005050 +- 0.000172 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::UInt32) |
0.954 | !!!! | 0.003501 +- 0.000094 | 0.003669 +- 0.000106 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::UInt8) |
0.514 | !!!! | 0.005334 +- 0.000086 | 0.010371 +- 0.000354 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float32, 4 >) |
0.434 | !!!! | 0.006438 +- 0.000081 | 0.014826 +- 0.000606 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Vec< vtkm::Float64, 3 >) |
0.554 | !!!! | 0.003820 +- 0.000081 | 0.006893 +- 0.000213 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Vec< vtkm::Int32, 2 >) |
0.793 | !!!! | 0.004121 +- 0.000113 | 0.005195 +- 0.000163 | ReduceByKey on 2097152 values with 629145 distinct vtkm::Id keys (vtkm::Vec< vtkm::UInt8, 4 >) |