MergePartitionedDataSet takes too long to compile
There have been reports that MergePartitionedDataSet.cxx
takes too long to compile in CUDA (and that might mean other device compilers as well). We should spend some time looking at reducing the compile time, possibly by breaking the compile into pieces.