Fix crash in CUDA compiler
Previously when PointLocatorUniformGrid.h was compiled by the CUDA compiler, the compiler would crash. Apparently during the ptxas part of the compiler goes into a crazy recursion and runs out of stack space. This appears to be a long-standing bug in CUDA (been there for multiple releases) without a clear reason why it sometimes rears its ugly head. (See for example https://devtalk.nvidia.com/default/topic/1028825/cuda-programming-and-performance/-ptxas-died-with-status-0xc00000fd-stack_overflow-/) The problem appears to be when having a doubly or triply nested loop over a box of values to check in the uniform array. This appears to fix the problem by converting that to a single for loop with some index magic to convert that to 3D indices.
Showing with 154 additions and 62 deletions