Skip to content

Use std::copy in serial/TBB Copy implementation.

Allison Vacanti requested to merge allisonvacanti/vtk-m:serial_copy into master

I had assumed that the compiler would be clever enough to turn the iterative implementation of Copy into a memcpy, but inspecting the disassembly on a release GCC build shows that this is not the case, likely because it can't assume that the memory ranges do not overlap.

Replacing the loop with std::copy speeds things up (about 30-50%) for most data types, though there is a slight (usually < 5%) slowdown for Vec types. The uint8 copy improved by a factor of 7.

Comparison:

Speedup iteration std::copy Benchmark (Type)
1.363 0.001590 +- 0.000087 0.001166 +- 0.000049 Copy 2097152 values (vtkm::Float32)
1.487 0.003429 +- 0.000185 0.002305 +- 0.000146 Copy 2097152 values (vtkm::Float64)
1.379 0.001568 +- 0.000072 0.001137 +- 0.000093 Copy 2097152 values (vtkm::Int32)
1.420 0.003410 +- 0.000173 0.002402 +- 0.000101 Copy 2097152 values (vtkm::Int64)
1.303 0.001564 +- 0.000083 0.001201 +- 0.000078 Copy 2097152 values (vtkm::UInt32)
7.204 0.002441 +- 0.000104 0.000339 +- 0.000029 Copy 2097152 values (vtkm::UInt8)
0.987 0.006602 +- 0.000266 0.006688 +- 0.000291 Copy 2097152 values (vtkm::Vec< vtkm::Float32, 4 >)
0.965 0.010065 +- 0.000528 0.010427 +- 0.000617 Copy 2097152 values (vtkm::Vec< vtkm::Float64, 3 >)
0.979 0.003327 +- 0.000191 0.003398 +- 0.000142 Copy 2097152 values (vtkm::Vec< vtkm::Int32, 2 >)
0.851 0.001579 +- 0.000090 0.001856 +- 0.000098 Copy 2097152 values (vtkm::Vec< vtkm::UInt8, 4 >)
Edited by Allison Vacanti

Merge request reports