Starting in thrust 1.8 the implementation of scan inclusive inside thrust became highly optimized by using parallel task groups. This new implementation has a bug that only exists when using custom binary operators, large size arrays, release mode, and no debugger or mem-checker attached.
While I have submitted the issue to thrust, we need to be able to work around the existing issue. The solution I have chosen is to mark all vtkm::exec::cuda::interal::WrappedBinaryOperators as being commutative as far as thrust is concerened. To make sure we don't get any unexpected behavior I have also had to create WrappedBinaryPredicate so that we don't mark any predicate as commutative.