CUDA Thrust Reduce can generate a compile error for sufficiently complex array handle
When you try to call the Reduce
operation in the CUDA device adapter with a sufficiently complex interator type, you get a compile error that says error: cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch
.
This appears to be a bug in either nvcc or Thrust. I believe it is related to the following reported issues:
I have noticed that this happens with ArrayHandleMultiplexer
with some types of sub-ArrayHandle
s. I've posted the following commit that demonstrates the problem when you try to compile it: kmorel/vtk-m@7e92893c. I believe you can check out this commit with the following commands.
git fetch "git@gitlab.kitware.com:kmorel/vtk-m.git" "7e92893c82350c1e2fc15bf8a00c2ec09c5206ab"
git checkout FETCH_HEAD
Note that MR !2168 (merged) works around this issue by creating a specialization of the Reduce
operation for the CUDA device adapter that reverts to the general version of this algorithm. This works (for now) but is not a great solution. First, the general algorithm is likely to be slower than the Thrust version. Second, there could be other complex ArrayHandle
types we use in the future that could run into the same problem, at which time we would have to create more exceptions.