VTK-m needs a way to express a max number of parallel CUDA tests per GPU
When testing in parallel it is possible that to many executables are running that use cuda. This than causes tests to fail with the following error:
2019-01-17 09:27:39.016 ( 0.005s) [main thread ] loguru.hpp:1969 Info| arguments: UnitTestStreamLineUniformGrid --device=Cuda
2019-01-17 09:27:39.017 ( 0.005s) [main thread ] loguru.hpp:1972 Info| Current dir: /home/kitware/buildslave/root/vtk-m-adora-linux-static-release_cuda_host_gcc_5_cuda_native_examples_gcc_logging/build/vtkm/worklet/testing
2019-01-17 09:27:39.017 ( 0.005s) [main thread ] loguru.hpp:1974 Info| stderr verbosity: 0
2019-01-17 09:27:39.017 ( 0.005s) [main thread ] loguru.hpp:1975 Info| -----------------------------------
2019-01-17 09:27:39.017 ( 0.005s) [main thread ] Logging.cxx:138 Info| Logging initialized.
2019-01-17 09:27:48.313 ( 9.301s) [main thread ]RuntimeDeviceTracker.cx:159 Info| Setting device 'Cuda' to 0
2019-01-17 09:27:48.313 ( 9.301s) [main thread ]RuntimeDeviceTracker.cx:159 Info| Setting device 'TBB' to 0
2019-01-17 09:27:48.313 ( 9.301s) [main thread ]RuntimeDeviceTracker.cx:159 Info| Setting device 'OpenMP' to 0
2019-01-17 09:27:48.313 ( 9.301s) [main thread ]RuntimeDeviceTracker.cx:159 Info| Setting device 'Serial' to 1
2019-01-17 09:27:48.313 ( 9.302s) [main thread ] Initialize.cxx:73 ERR| Unavailable device specificed after option '--device': 'Cuda'.
Valid devices are: "Any" "Serial"
To help alleviate this we have done the following:
-
We have made the
DeviceAdapterRuntimeDetectorCuda
handle hardware that is at max capacity gracefully. Previously it would hard disable using that GPU for-ever. Now it ignore that the hardware is at capacity and presumes that by the first kernel launch the device will be ready. ( !1533 (merged) ) -
We have setup in our buildbot infrastructure rules to re-run any test that fails in parallel again in serial. This works around the max capacity issue but increase the test time turn-around.
-
This issue is not only a CUDA issue. The OPENMP see inverse scaling when run in paralell, and is why currently they are all run serially.
What needs to be done:
- We need support for OpenMP and TBB tests to be passed on the command line the number of threads to use
- We need support for CUDA tests to be told which hardware gpu device to execute on
- We need to watch the work that other people are doing to help ctest fix this issue: