Update the cuda IteratorFromArrayPortal to use ptrdiff_t.
This make the advance / distance_to function signatures constant no matter if we are building with 32/64 bit ids.
This make the advance / distance_to function signatures constant no matter if we are building with 32/64 bit ids.