CUDA: separable compilation (relocatable device code): exe doesn't see the lib functions
In the attached archive (the sources are also available at https://github.com/srogatch/cudalink_bug ) you can find a small cmake-based CUDA program, consisting of a static library and an executable. The static library contains a __device__
function that is exported to the executable. However, when linking the executable "Undefined reference" error happens for that function, although CUDA_SEPARABLE_COMPILATION is ON, CUDA_RESOLVE_DEVICE_SYMBOLS is ON and POSITION_INDEPENDENT_CODE is ON . The platform is Ubuntu 20.04. The C++ compiler is g++ 9.3.0, the CUDA is 11.3, cmake is 3.20.2.
cudalink_bug_2021-05-21.tar.gz
The output of make
is provided below for convenience:
make VERBOSE=1
/usr/bin/cmake -S/scratch/src/reproduce/cudalink_bug -B/scratch/src/reproduce/cudalink_bug/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /scratch/src/reproduce/cudalink_bug/build/CMakeFiles /scratch/src/reproduce/cudalink_bug/build//CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/scratch/src/reproduce/cudalink_bug/build'
make -f libdemo/CMakeFiles/cudalink_demo.dir/build.make libdemo/CMakeFiles/cudalink_demo.dir/depend
make[2]: Entering directory '/scratch/src/reproduce/cudalink_bug/build'
cd /scratch/src/reproduce/cudalink_bug/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /scratch/src/reproduce/cudalink_bug /scratch/src/reproduce/cudalink_bug/libdemo /scratch/src/reproduce/cudalink_bug/build /scratch/src/reproduce/cudalink_bug/build/libdemo /scratch/src/reproduce/cudalink_bug/build/libdemo/CMakeFiles/cudalink_demo.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/scratch/src/reproduce/cudalink_bug/build'
make -f libdemo/CMakeFiles/cudalink_demo.dir/build.make libdemo/CMakeFiles/cudalink_demo.dir/build
make[2]: Entering directory '/scratch/src/reproduce/cudalink_bug/build'
[ 12%] Building CXX object libdemo/CMakeFiles/cudalink_demo.dir/demo2.cpp.o
cd /scratch/src/reproduce/cudalink_bug/build/libdemo && /usr/bin/c++ -isystem /usr/local/cuda-11.3/targets/x86_64-linux/include -g -fPIC -MD -MT libdemo/CMakeFiles/cudalink_demo.dir/demo2.cpp.o -MF CMakeFiles/cudalink_demo.dir/demo2.cpp.o.d -o CMakeFiles/cudalink_demo.dir/demo2.cpp.o -c /scratch/src/reproduce/cudalink_bug/libdemo/demo2.cpp
[ 25%] Building CUDA object libdemo/CMakeFiles/cudalink_demo.dir/demo1.cu.o
cd /scratch/src/reproduce/cudalink_bug/build/libdemo && /usr/local/cuda-11.3/bin/nvcc -forward-unknown-to-host-compiler -isystem=/usr/local/cuda-11.3/targets/x86_64-linux/include -g --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler=-fPIC -gencode=arch=compute_75,code=sm_75 -std=c++14 -MD -MT libdemo/CMakeFiles/cudalink_demo.dir/demo1.cu.o -MF CMakeFiles/cudalink_demo.dir/demo1.cu.o.d -x cu -dc /scratch/src/reproduce/cudalink_bug/libdemo/demo1.cu -o CMakeFiles/cudalink_demo.dir/demo1.cu.o
[ 37%] Linking CUDA device code CMakeFiles/cudalink_demo.dir/cmake_device_link.o
cd /scratch/src/reproduce/cudalink_bug/build/libdemo && /usr/bin/cmake -E cmake_link_script CMakeFiles/cudalink_demo.dir/dlink.txt --verbose=1
/usr/local/cuda-11.3/bin/nvcc -forward-unknown-to-host-compiler -g --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/cudalink_demo.dir/demo2.cpp.o CMakeFiles/cudalink_demo.dir/demo1.cu.o -o CMakeFiles/cudalink_demo.dir/cmake_device_link.o -L/usr/local/cuda-11.3/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-11.3/targets/x86_64-linux/lib -lcudadevrt -lcudart_static -lrt -lpthread -ldl
[ 50%] Linking CXX static library libcudalink_demo.a
cd /scratch/src/reproduce/cudalink_bug/build/libdemo && /usr/bin/cmake -P CMakeFiles/cudalink_demo.dir/cmake_clean_target.cmake
cd /scratch/src/reproduce/cudalink_bug/build/libdemo && /usr/bin/cmake -E cmake_link_script CMakeFiles/cudalink_demo.dir/link.txt --verbose=1
/usr/bin/ar qc libcudalink_demo.a CMakeFiles/cudalink_demo.dir/demo2.cpp.o CMakeFiles/cudalink_demo.dir/demo1.cu.o
/usr/bin/ar q libcudalink_demo.a CMakeFiles/cudalink_demo.dir/cmake_device_link.o
/usr/bin/ranlib libcudalink_demo.a
make[2]: Leaving directory '/scratch/src/reproduce/cudalink_bug/build'
[ 50%] Built target cudalink_demo
make -f exedemo/CMakeFiles/cudalink_exe.dir/build.make exedemo/CMakeFiles/cudalink_exe.dir/depend
make[2]: Entering directory '/scratch/src/reproduce/cudalink_bug/build'
cd /scratch/src/reproduce/cudalink_bug/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /scratch/src/reproduce/cudalink_bug /scratch/src/reproduce/cudalink_bug/exedemo /scratch/src/reproduce/cudalink_bug/build /scratch/src/reproduce/cudalink_bug/build/exedemo /scratch/src/reproduce/cudalink_bug/build/exedemo/CMakeFiles/cudalink_exe.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/scratch/src/reproduce/cudalink_bug/build'
make -f exedemo/CMakeFiles/cudalink_exe.dir/build.make exedemo/CMakeFiles/cudalink_exe.dir/build
make[2]: Entering directory '/scratch/src/reproduce/cudalink_bug/build'
[ 62%] Building CXX object exedemo/CMakeFiles/cudalink_exe.dir/exe2.cpp.o
cd /scratch/src/reproduce/cudalink_bug/build/exedemo && /usr/bin/c++ -isystem /usr/local/cuda-11.3/targets/x86_64-linux/include -g -fPIE -MD -MT exedemo/CMakeFiles/cudalink_exe.dir/exe2.cpp.o -MF CMakeFiles/cudalink_exe.dir/exe2.cpp.o.d -o CMakeFiles/cudalink_exe.dir/exe2.cpp.o -c /scratch/src/reproduce/cudalink_bug/exedemo/exe2.cpp
[ 75%] Building CUDA object exedemo/CMakeFiles/cudalink_exe.dir/exe1.cu.o
cd /scratch/src/reproduce/cudalink_bug/build/exedemo && /usr/local/cuda-11.3/bin/nvcc -forward-unknown-to-host-compiler -isystem=/usr/local/cuda-11.3/targets/x86_64-linux/include -g --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler=-fPIE -gencode=arch=compute_75,code=sm_75 -std=c++14 -MD -MT exedemo/CMakeFiles/cudalink_exe.dir/exe1.cu.o -MF CMakeFiles/cudalink_exe.dir/exe1.cu.o.d -x cu -dc /scratch/src/reproduce/cudalink_bug/exedemo/exe1.cu -o CMakeFiles/cudalink_exe.dir/exe1.cu.o
[ 87%] Linking CUDA device code CMakeFiles/cudalink_exe.dir/cmake_device_link.o
cd /scratch/src/reproduce/cudalink_bug/build/exedemo && /usr/bin/cmake -E cmake_link_script CMakeFiles/cudalink_exe.dir/dlink.txt --verbose=1
/usr/local/cuda-11.3/bin/nvcc -forward-unknown-to-host-compiler -g --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/cudalink_exe.dir/exe2.cpp.o CMakeFiles/cudalink_exe.dir/exe1.cu.o -o CMakeFiles/cudalink_exe.dir/cmake_device_link.o -L/usr/local/cuda-11.3/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-11.3/targets/x86_64-linux/lib -lcudadevrt -lcudart_static -lrt -lpthread -ldl
nvlink error : Undefined reference to '_ZN4Demo5Test1Eii' in 'CMakeFiles/cudalink_exe.dir/exe1.cu.o'
make[2]: *** [exedemo/CMakeFiles/cudalink_exe.dir/build.make:114: exedemo/CMakeFiles/cudalink_exe.dir/cmake_device_link.o] Error 255
make[2]: Leaving directory '/scratch/src/reproduce/cudalink_bug/build'
make[1]: *** [CMakeFiles/Makefile2:142: exedemo/CMakeFiles/cudalink_exe.dir/all] Error 2
make[1]: Leaving directory '/scratch/src/reproduce/cudalink_bug/build'
make: *** [Makefile:91: all] Error 2
The static library has the __device__
function in it. Below is a piece of output from nm
:
000000000000001a T _ZN4Demo5Test1Eii