FindCUDA: Race condition (introduced by wrong dependencies?)
The issue was reported e.g. here: https://github.com/horovod/horovod/issues/2358
Basically what is done is simply:
cuda_add_library(horovod_cuda_kernels cuda_kernels.cu OPTIONS -D_GLIBCXX_USE_CXX11_ABI=1)
cuda_add_library(compatible_horovod_cuda_kernels cuda_kernels.cu OPTIONS -D_GLIBCXX_USE_CXX11_ABI=0)
Somehow a parallel build then tries to created the dependency information twice:
-- Generating dependency file: /tmp/pip-install-xqfpq4y3/horovod/build/temp.linux-x86_64-3.6/horovod/common/ops/cuda/CMakeFiles/horovod_cuda_kernels.dir//horovod_cuda_kernels_generated_cuda_kernels.cu.o.NVCC-depend
make[2]: Entering directory '/tmp/pip-install-xqfpq4y3/horovod/build/temp.linux-x86_64-3.6'
Scanning dependencies of target compatible_gloo
-- Generating dependency file: /tmp/pip-install-xqfpq4y3/horovod/build/temp.linux-x86_64-3.6/horovod/common/ops/cuda/CMakeFiles/horovod_cuda_kernels.dir//horovod_cuda_kernels_generated_cuda_kernels.cu.o.NVCC-depend
As the script (run_nvcc.cmake
) deletes the file after it is done, the second invocation will then fail to find the file and error out breaking the build.
I haven't found anything obvious but from what I see:
- There is only 1 generated script (out of run_nvcc.cmake) for that file
-
add_library(${cuda_target} ${_cmake_options} ${_generated_files} ...)
is only called once with the_generated_files
set tohorovod_cuda_kernels_generated_cuda_kernels.cu.o
-
add_custom_command
is only added once with that output file
Hence my conclusion so far is that something in CMake messes up the dependency chain and invokes the custom_command twice.
This is with CMake 3.15.3 but reproduces also with 3.19.3
I suspect the problem is due to adding 2 libraries with the same source file and something in the dependency chain is not correctly prefixed with the target as removing the second cuda_add_library fixes the issue as far as I can tell