ArrayHandleDiscard not usable by many device algorithms
We should add a unit test that runs each relevant DeviceAdapterAlgorithm
using an ArrayHandleDiscard
for all outputs to ensure that they work as expected.
For example, most CPU parallel ReduceByKey implementations currently query the output array during the reduction step to check for key ranges that overlap blocked ranges. We should modify these algorithms to avoid this.