Skip to content

CUDA/NVIDIA: fix LTO with explicit architectures

Rafael Stahl requested to merge rafzi/cmake:rafzi-master-patch-62902 into master

The following CMakeLists.txt causes wrong nvcc flags to be generated:

cmake_minimum_required(VERSION 3.26)
project(cudalto CXX CUDA)

set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)

add_executable(cudalto 
  main.cpp
  cu.cu
)

set_target_properties(cudalto PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_property(TARGET cudalto PROPERTY CUDA_ARCHITECTURES 75-virtual)

The generated command is

/usr/local/cuda-11.7/bin/nvcc -forward-unknown-to-host-compiler   -std=c++17 --generate-code=arch=compute_75,code=[compute_75lto_75] -MD -MT CMakeFiles/cut.dir/cu.cu.o -MF CMakeFiles/cut.dir/cu.cu.o.d -x cu -rdc=true -c /tmp/cut/cu.cu -o CMakeFiles/cut.dir/cu.cu.o

and the error it causes:

nvcc fatal   : Unsupported gpu architecture 'compute_75lto_75'

Topic-rename: cuda-ipo-arch

Edited by Brad King

Merge request reports