Some virtual objects failing on CUDA with Visual Studio
When I try to compile and run VTK-m on my Windows desktop using Visual Studio 2017 with CUDA 10.1, several of the CUDA unit tests fail:
70:UnitTestCudaCellLocatorRectilinearGrid
71:UnitTestCudaCellLocatorUniformBins
72:UnitTestCudaCellLocatorUniformGrid
74:UnitTestCudaColorTable
82:UnitTestCudaPointLocatorUniformGrid
223:UnitTestBoundingIntervalHierarchyCUDA
258:UnitTestParticleAdvectionCUDA
262:UnitTestProbeCUDA
273:UnitTestTemporalAdvectionCUDA
353:UnitTestExternalFacesFilterCUDA
365:UnitTestLagrangianFilterCUDA
376:UnitTestProbeCUDA
378:UnitTestStreamlineFilterCUDA
398:UnitTestMapperConnectivityCUDA
399:UnitTestMultiMapperCUDA
401:UnitTestMapperQuadsCUDA
402:UnitTestMapperRayTracerCUDA
403:UnitTestMapperWireframerCUDA
Several of these failures come from an illegal instruction on a CUDA device. For example, here is the error from UnitTestCudaCellLocatorUniformBins
:
***** Uncaught VTKm exception thrown.
CUDA Error: an illegal instruction was encountered
cudaMemcpyAsync(controlPtr, executionPtr, static_cast<std::size_t>(numBytes), cudaMemcpyDeviceToHost, cudaStreamPerThread) @ C:/Users/kmorel/src/vtk-m/vtkm/cont/cuda/internal/ExecutionArrayInterfaceBasicCuda.cu:192
Although the error reports coming from cudaMemcpyAsync
, I suspect the actual problem comes from the kernel that was launched asynchronously before this call to pull data to the host. I believe the problem is that some (but not all) of the virtual objects used on the device are not being set up correctly, and that is leading to nasty problems like jumping to a bad location for a subroutine.
I suspect the core issue comes from windows specific behavior documented in Section F.3.10.6 of the CUDA Programming Guide. This section states, "The CUDA compiler follows the IA64 ABI for class layout, while the Microsoft host compiler does not." "The CUDA compiler may compute the class layout and size differently than the Microsoft host compiler for [certain types]." (Types with virtual methods fall into this category.) I believe at least some of the problems I am seeing come from trying to copy types of this nature from host to device.
Note: I am not 100% sure this is the cause of the problems I am seeing. I tried verifying by running the UnitTestCudaCellLocatorUniformBins
test in the Nsight debugger, but it has been running for over 16 hours now and still has not encountered the illegal instruction.