Merge branch 'correct_cuda_scan' into 'master'
Correct undefined behavior that was causing scan test failures. We need call PrepareForInput on the input argument before invoking a function. The order of execution of parameters of a function is undefined, so we need to make sure input is called before output, or else in-place use case breaks. See merge request !44