Specialize Copy/CopySubRange for TBB.
Scaling isn't great on 4 cores + HT (except for UInt8
for some reason, which scales amazingly well), but there is an improvement over the old implementation (including UInt8
, which sees performance double).
Speedup | Warn | serial | parallel | Benchmark (Type) |
---|---|---|---|---|
1.103 | !!! | 0.001691 +- 0.000115 | 0.001533 +- 0.000070 | Copy 2097152 values (vtkm::Float32) |
1.116 | !!! | 0.003537 +- 0.000401 | 0.003169 +- 0.000135 | Copy 2097152 values (vtkm::Float64) |
1.057 | !!! | 0.001631 +- 0.000112 | 0.001542 +- 0.000087 | Copy 2097152 values (vtkm::Int32) |
1.079 | !!! | 0.003488 +- 0.000200 | 0.003231 +- 0.000143 | Copy 2097152 values (vtkm::Int64) |
1.045 | !!! | 0.001635 +- 0.000111 | 0.001565 +- 0.000094 | Copy 2097152 values (vtkm::UInt32) |
7.990 | 0.002501 +- 0.000186 | 0.000313 +- 0.000022 | Copy 2097152 values (vtkm::UInt8) | |
1.113 | !!! | 0.006937 +- 0.000569 | 0.006232 +- 0.000153 | Copy 2097152 values (vtkm::Vec< vtkm::Float32, 4 >) |
1.143 | !!! | 0.010736 +- 0.001137 | 0.009390 +- 0.000203 | Copy 2097152 values (vtkm::Vec< vtkm::Float64, 3 >) |
1.035 | !!! | 0.003322 +- 0.000200 | 0.003210 +- 0.000140 | Copy 2097152 values (vtkm::Vec< vtkm::Int32, 2 >) |
1.080 | !!! | 0.001662 +- 0.000090 | 0.001539 +- 0.000077 | Copy 2097152 values (vtkm::Vec< vtkm::UInt8, 4 >) |