Clang CUDA on Windows doesn't build without manual flags and standard paths
I have just tried to use CMake 3.17.20200520-g81e8f627 and clang to compile a CUDA project we normally build with nvcc. I bumped into a few issues:
-
CMake seems to ignore theThis is intended.CMAKE_CUDA_COMPILER_WORKS
option, so the compiler check is always performed. -
At least during the compiler test, CMake does not seem to forward theThis was becauseCMAKE_CUDA_ARCHITECTURES
values to clang, I have to set it manually usingCMAKE_CUDA_FLAGS=--cuda-gpu-arch=XXX
. Otherwise it tries to compile forsm_20
, which apparently my CUDA installation (CUDA 10.1) doesn't support.cuda-path
wasn't set, see below. -
Related to the above, I don't know how to make CMake compile for multiple GPU architectures at once.Trick no longer required.CMAKE_CUDA_ARCHITECTURES
can contain an array of architectures (since nvcc can compile for multiple architectures at once), but since clang can only compile for one architecture at a time with--cuda-gpu-arch
, I cannot build multiple architectures with the trick above. -
CMake does not seem to forward the path to the CUDA runtime to clang, I have to set it manually using CMAKE_CUDA_FLAGS=--cuda-path=XXX
. -
When linking the test program, the clang linker cannot find the CUDA lib files because the linker include path is set to just <CUDA_PATH>/lib
(which is empty) instead of<CUDA_PATH>/lib/x64
(which contains the 64bit lib files). I copied the lib files manually into thelib
folder.
After that last step, CMake decided the compiler was working.
Then I went on to compile our project, a shared library, in CMAKE_BUILD_TYPE=RelWithDebInfo
. Generator is MinGW Makefiles. Compilation went fine, however at link time there were inconsistencies in the runtime libraries being used. CXX files correctly used MD_DynamicRelease
, but CU files used MT_StaticRelease
. This can be seen below.
With VERBOSE=1
, mingw32-make
reports the following CXX compiler command being used:
C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE <CUSTOM_PREPROCESSOR> <CUSTOM_INCLUDE_PATHS> -O2 -g -DNDEBUG -Xclang -gcodeview -D_DLL -D_MT -Xclang --dependent-lib=msvcrt <CUSTOM_COMPILER_OPTS> -o <OUTPUT_FILE> -c <CXX_FILE>
and the following CU compiler command:
C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE <CUSTOM_PREPROCESSOR> <CUSTOM_INCLUDE_PATHS> --cuda-gpu-arch=sm_30 --cuda-path=C:/mingw/cuda --cuda-gpu-arch=sm_30 <CUSTOM_COMPILER_OPTS> -std=gnu++14 -x cuda -c <CUDA_FILE> -o <OUTPUT_FILE>
and for the linker:
C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE -fuse-ld=lld-link -nostartfiles -nostdlib -O2 -g -DNDEBUG -Xclang -gcodeview -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -shared -o <OUTPUT_DLL> -Xlinker /implib:<LIBRARY>.lib -Xlinker /pdb:<LIBRARY>.pdb -Xlinker /version:0.0 <OBJECTS> <LINK_LIBS>
Notice how -D_DLL -D_MT
are missing for the CU compilation command, but also other important compilation flags, like -O2 -g
. Notice also how there are now two --cuda-gpu-arch=sm_30
; I guess one of them is the one I added manually in CMAKE_CUDA_FLAGS
(to make the compiler test pass) and the other was added by CMake somehow.