Skip to content

CUDA Unit Tests failing on CUDA 10.1

It appears that a large number of unit tests are failing on CUDA 10.1.

$ ctest --rerun-failed
Total Test time (real) =  12.26 sec

The following tests FAILED:
	 66 - UnitTestCudaArrayHandle (Failed)
	 67 - UnitTestCudaArrayHandleFancy (Failed)
	 68 - UnitTestCudaArrayHandleVirtualCoordinates (Failed)
	 69 - UnitTestCudaBitField (Failed)
	 70 - UnitTestCudaCellLocatorRectilinearGrid (Child aborted)
	 71 - UnitTestCudaCellLocatorUniformBins (Child aborted)
	 72 - UnitTestCudaCellLocatorUniformGrid (Failed)
	 73 - UnitTestCudaComputeRange (Failed)
	 74 - UnitTestCudaColorTable (Failed)
	 75 - UnitTestCudaDataSetExplicit (Failed)
	 76 - UnitTestCudaDataSetSingleType (Child aborted)
	 77 - UnitTestCudaDeviceAdapter (Failed)
	 78 - UnitTestCudaGeometry (Failed)
	 79 - UnitTestCudaImplicitFunction (Failed)
	 80 - UnitTestCudaMath (Failed)
	 82 - UnitTestCudaPointLocatorUniformGrid (Child aborted)
	 83 - UnitTestCudaVirtualObjectHandle (Failed)
	109 - UnitTestDataSetBuilderExplicit (Failed)
	110 - UnitTestDataSetBuilderRectilinear (Failed)
	118 - UnitTestFieldRangeCompute (Failed)
	122 - UnitTestMultiBlock (Failed)
	131 - UnitTestFieldRangeGlobalCompute (Failed)
	133 - UnitTestSerializationDataSet (Failed)
Errors while running CTest

More info on a single failing unit test (it appears that the cause is the same for all of them):

 ./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
***** Uncaught VTKm exception thrown.
CUDA Error: invalid device function
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110

And under cuda-memcheck:

 /usr/local/cuda-10.1/bin/cuda-memcheck ./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
========= CUDA-MEMCHECK
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaPointerGetAttributes. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf047f9]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaLaunchKernel. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf13615]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71ca0]
***** Uncaught VTKm exception thrown.
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71515]
CUDA Error: invalid device function
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x710fc]
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x70c19]
...
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= ERROR SUMMARY: 4 errors

System info:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic
$ nvidia-smi
Mon Jul 22 10:23:55 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:65:00.0  On |                  N/A |
|  0%   45C    P0    30W / 185W |   1213MiB /  7979MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1285      G   /usr/bin/gnome-shell                         176MiB |
|    0      1998      G   /usr/lib/xorg/Xorg                           482MiB |
|    0      2129      G   /usr/bin/gnome-shell                         386MiB |
|    0      6945      G   ...-token=BAB748F2325B6E879753DBB4E9D9726C   134MiB |
+-----------------------------------------------------------------------------+

Commit that I built to reproduce: 468ee61c

Edited by Nick Thompson