Move VTK-m Cuda backend over to a grid-stride iteration pattern.
This allows for easier host side logic when determining grid and block sizes, and allows for a smaller library side by moving some logic into compiled in functions.
This allows for easier host side logic when determining grid and block sizes, and allows for a smaller library side by moving some logic into compiled in functions.