Do not use volatile when calling CUDA atomicCAS
Although it makes sense to assume a pointer is volatile
when doing an
atomic operation, the arguments to the atomicCAS
overloads take
regular pointers. The overload resolution can fail if you use the
volatile
keyword.