Skip to content

Clang CUDA on Windows doesn't build without manual flags and standard paths

I have just tried to use CMake 3.17.20200520-g81e8f627 and clang to compile a CUDA project we normally build with nvcc. I bumped into a few issues:

  • CMake seems to ignore the CMAKE_CUDA_COMPILER_WORKS option, so the compiler check is always performed. This is intended.
  • At least during the compiler test, CMake does not seem to forward the CMAKE_CUDA_ARCHITECTURES values to clang, I have to set it manually using CMAKE_CUDA_FLAGS=--cuda-gpu-arch=XXX. Otherwise it tries to compile for sm_20, which apparently my CUDA installation (CUDA 10.1) doesn't support. This was because cuda-path wasn't set, see below.
  • Related to the above, I don't know how to make CMake compile for multiple GPU architectures at once. CMAKE_CUDA_ARCHITECTURES can contain an array of architectures (since nvcc can compile for multiple architectures at once), but since clang can only compile for one architecture at a time with --cuda-gpu-arch, I cannot build multiple architectures with the trick above. Trick no longer required.
  • CMake does not seem to forward the path to the CUDA runtime to clang, I have to set it manually using CMAKE_CUDA_FLAGS=--cuda-path=XXX.
  • When linking the test program, the clang linker cannot find the CUDA lib files because the linker include path is set to just <CUDA_PATH>/lib (which is empty) instead of <CUDA_PATH>/lib/x64 (which contains the 64bit lib files). I copied the lib files manually into the lib folder.

After that last step, CMake decided the compiler was working.

Then I went on to compile our project, a shared library, in CMAKE_BUILD_TYPE=RelWithDebInfo. Generator is MinGW Makefiles. Compilation went fine, however at link time there were inconsistencies in the runtime libraries being used. CXX files correctly used MD_DynamicRelease, but CU files used MT_StaticRelease. This can be seen below.

With VERBOSE=1, mingw32-make reports the following CXX compiler command being used:

C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE <CUSTOM_PREPROCESSOR> <CUSTOM_INCLUDE_PATHS> -O2 -g -DNDEBUG -Xclang -gcodeview -D_DLL -D_MT -Xclang --dependent-lib=msvcrt <CUSTOM_COMPILER_OPTS> -o <OUTPUT_FILE> -c <CXX_FILE>

and the following CU compiler command:

C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE <CUSTOM_PREPROCESSOR> <CUSTOM_INCLUDE_PATHS> --cuda-gpu-arch=sm_30 --cuda-path=C:/mingw/cuda --cuda-gpu-arch=sm_30 <CUSTOM_COMPILER_OPTS> -std=gnu++14 -x cuda -c <CUDA_FILE> -o <OUTPUT_FILE>

and for the linker:

C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE -fuse-ld=lld-link -nostartfiles -nostdlib -O2 -g -DNDEBUG -Xclang -gcodeview -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -shared -o <OUTPUT_DLL>  -Xlinker /implib:<LIBRARY>.lib -Xlinker /pdb:<LIBRARY>.pdb -Xlinker /version:0.0 <OBJECTS> <LINK_LIBS>

Notice how -D_DLL -D_MT are missing for the CU compilation command, but also other important compilation flags, like -O2 -g. Notice also how there are now two --cuda-gpu-arch=sm_30; I guess one of them is the one I added manually in CMAKE_CUDA_FLAGS (to make the compiler test pass) and the other was added by CMake somehow.

Edited by Corentin Schreiber
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information