Draft: fix long cuda compiling issue for MergePartitionedDataSet

This MR tries to fix the long cuda compiling issue for MergePartitionedDataSet filter discussed in this issue.

Merge request reports

Loading