ArrayHandleZip performance implications
Many of the ByKey
methods in DeviceAdapterAlgorithm
implementations use a 'building block' approach, combining inputs with keys into an ArrayHandleZip
and calling a simpler version of the algorithm.
We've noticed large overhead to these methods, possibly due to creating temporary vtkm::Pair
s in ArrayPortalZip::Get
and the related WrappedBinaryOperator
/ArrayPortalValueReference::Swap
.
The following benchmark highlights this issue: BenchmarkZipArraySort.cxx
Test | Serial | TBB | CUDA |
---|---|---|---|
Sort Basic Array | 1.864s | 0.638s | 0.457s |
Sort Zip Array | 2.884s | 1.229s | 2.390s |
Sort Zip Array w/ KeyComparator | 2.513s | 1.167s | XXXXXX |
It may be worthwhile to look at either optimizing the zip portals or refactoring the algorithms to remove them.