Copy speed benchmarks
Cleans up the device adapter algorithm benchmarks and makes it easier to modify things like array size, fixed buffer size vs numValues, and TBB threads used. Also reduced the default benchmarking iterations from 500 to 100 for time's sake.
Added a new benchmark that copies various sized arrays and prints the transfer speed.
The new copy benchmarks highlighted an issue that std::copy
ing Pair
s
and Vec
s were not optimized to memcpy
. For a 256 MiB buffer on my
laptop w/ GCC, the serial copy speeds were:
UInt8: 10.10 GiB/s
Vec<UInt8, 2> 3.12 GiB/s
Pair<UInt32, Float32> 6.92 GiB/s
After the last patch, which ensures triviality of these containers, the optimization occurs:
UInt8: 10.12 GiB/s
Vec<UInt8, 2> 9.66 GiB/s
Pair<UInt32, Float32> 9.88 GiB/s