Reduction on CUDA handles different input and output types better
When reducing an input type that differs from the output type you need to write a custom binary operator that also implements how to do the unary transformation.
When reducing an input type that differs from the output type you need to write a custom binary operator that also implements how to do the unary transformation.