CUDA Unit Tests failing on CUDA 10.1
It appears that a large number of unit tests are failing on CUDA 10.1.
$ ctest --rerun-failed
Total Test time (real) = 12.26 sec
The following tests FAILED:
66 - UnitTestCudaArrayHandle (Failed)
67 - UnitTestCudaArrayHandleFancy (Failed)
68 - UnitTestCudaArrayHandleVirtualCoordinates (Failed)
69 - UnitTestCudaBitField (Failed)
70 - UnitTestCudaCellLocatorRectilinearGrid (Child aborted)
71 - UnitTestCudaCellLocatorUniformBins (Child aborted)
72 - UnitTestCudaCellLocatorUniformGrid (Failed)
73 - UnitTestCudaComputeRange (Failed)
74 - UnitTestCudaColorTable (Failed)
75 - UnitTestCudaDataSetExplicit (Failed)
76 - UnitTestCudaDataSetSingleType (Child aborted)
77 - UnitTestCudaDeviceAdapter (Failed)
78 - UnitTestCudaGeometry (Failed)
79 - UnitTestCudaImplicitFunction (Failed)
80 - UnitTestCudaMath (Failed)
82 - UnitTestCudaPointLocatorUniformGrid (Child aborted)
83 - UnitTestCudaVirtualObjectHandle (Failed)
109 - UnitTestDataSetBuilderExplicit (Failed)
110 - UnitTestDataSetBuilderRectilinear (Failed)
118 - UnitTestFieldRangeCompute (Failed)
122 - UnitTestMultiBlock (Failed)
131 - UnitTestFieldRangeGlobalCompute (Failed)
133 - UnitTestSerializationDataSet (Failed)
Errors while running CTest
More info on a single failing unit test (it appears that the cause is the same for all of them):
./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
***** Uncaught VTKm exception thrown.
CUDA Error: invalid device function
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110
And under cuda-memcheck
:
/usr/local/cuda-10.1/bin/cuda-memcheck ./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
========= CUDA-MEMCHECK
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaPointerGetAttributes.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf047f9]
...
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaLaunchKernel.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf13615]
...
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaGetLastError.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71ca0]
***** Uncaught VTKm exception thrown.
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71515]
CUDA Error: invalid device function
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x710fc]
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x70c19]
...
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
========= Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= ERROR SUMMARY: 4 errors
System info:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
$ nvidia-smi
Mon Jul 22 10:23:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:65:00.0 On | N/A |
| 0% 45C P0 30W / 185W | 1213MiB / 7979MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1285 G /usr/bin/gnome-shell 176MiB |
| 0 1998 G /usr/lib/xorg/Xorg 482MiB |
| 0 2129 G /usr/bin/gnome-shell 386MiB |
| 0 6945 G ...-token=BAB748F2325B6E879753DBB4E9D9726C 134MiB |
+-----------------------------------------------------------------------------+
Commit that I built to reproduce: 468ee61c
Edited by Nick Thompson