CUDA_SEPARABLE_COMPILATION with mixed CUDA OpenACC target
I get a host linking error when creating a mixed CUDA+OpenACC target executable using CUDA_SEPARABLE_COMPILATION.
The error can be fixed including only CUDA objects in the device linking step (or excluding the openacc.o
from the cmake_device_link.o
object).
I attach a tar.gz file with sources and CMakeLists.txt to reproduce the problem.
DETAILS:
The project is composed by CUDA source code, an OpenACC source code and a plain C++ main which call functions defined in the other two compilation units.
Setting CUDA_SEPARABLE_COMPILATION property on the executable target adds the -dc
flag to CMAKE_CUDA_FLAGS and generates an intermediate device linking step in which the cmake_device_link.o
object is created from ALL object codes. During the host linking step, I get the following error:
CMakeFiles/test_mpi_cuda_openacc.dir/cmake_device_link.o:(.toc+0x0): undefined reference to `__fatbinwrap_98_cmake_test_openacc_cpp'
pgacclnk: child process exit status 1: /usr/bin/ld
Disabling CUDA_SEPARATE_COMPILATION, but manually adding required flags to handle separate compilation, the target is built properly.
CMAKE steps can be reproduced by hand as follow:
OPENACC_ARCH_FLAGS="-acc=gpu -gpu=cc70 -acc=noautopar -Minfo=accel"
CUDA_ARCH_FLAGS="--generate-code=arch=compute_70,code=[compute_70,sm_70]"
CUDA_LIB_DIR=$HPC_SDK_HOME/Linux_ppc64le/2021/cuda/lib64 # customize your path
# Compile MPI C++ code
pgc++ -c main.cpp -o main.cpp.o
# Compile OPENACC code
pgc++ $OPENACC_ARCH_FLAGS -c test_openacc.cpp -o openacc.cpp.o
# Compile CUDA code
nvcc $CUDA_ARCH_FLAGS -dc test_cuda.cu -o cuda.cu.o
# removing openacc.cpp.o from cmake_device_link objects works without errors
DLINK_OBJS="cuda.cu.o main.cpp.o openacc.cpp.o" # <=== this cause error
nvcc $CUDA_ARCH_FLAGS -dlink $DLINK_OBJS -o cmake_device_link.o
# Generate executable
nvc++ $OPENACC_ARCH_FLAGS -o main cuda.cu.o openacc.cpp.o main.cpp.o cmake_device_link.o -L$CUDA_LIB_DIR -lcudadevrt -lcudart_static -lrt
This is my CMakeLists.txt file:
project(test_cuda_openacc CUDA CXX)
cmake_minimum_required(VERSION 3.20)
set(TARGETNAME test_cuda_openacc)
set(SOURCES main.cpp test_cuda.cu test_openacc.cpp)
add_executable(${TARGETNAME} ${SOURCES})
set(CUDA_ARCH "70")
set_target_properties(${TARGETNAME} PROPERTIES CUDA_ARCHITECTURES ${CUDA_ARCH})
if(FAKE_DLINK)
# artificially perform separate compilation without CMAKE support
# add relocatable device code flag
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -rdc true")
# this instruct the compiler to link against cuda object
target_link_options(${TARGETNAME} BEFORE PRIVATE "-cuda")
else(FAKE_DLINK) # this methos DOES NOT WORK !!!
set_target_properties(${TARGETNAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
endif(FAKE_DLINK)
find_package(OpenACC REQUIRED)
set(OpenACC_CXX_FLAGS "-acc=gpu -acc=noautopar -Minfo=accel -gpu=cc${CUDA_ARCH}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenACC_CXX_FLAGS}")
Am I missing something? How can I control which objects should be included in the device linking step?
thank you for your attentiontest_cuda_openacc_cmake.tar.gz
ENVIRONMENT:
CMAKE version 3.20.0
HPC-SDK 2021
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:08:50_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
$ pgc++ --version
pgc++ (aka nvc++) 21.5-0 linuxpower target on Linuxpower
PGI Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.