NVHPC: Multiple languages + OpenACC prevents GPU usage
Explanation
When compiling using OpenACC the order in which the libraries are linked is important. The following three libraries:
- libacchost.so
- libaccdevice.so
- libnvc.so
all contain symbols of the form acc_XXX
so the symbol is taken from the library that is linked first. When compiling a program for C, by default, the libraries are linked in the order above. The acc symbols in libaccdevice.so
therefore have priority and everything works as expected. If however -lnvc
is used to compile a program then the libraries are ordered as follows:
- libnvc.so
- libacchost.so
- libaccdevice.so
When the acc symbols in libnvc.so
are used the code does not run on GPU. This can be seen using export NV_ACC_NOTIFY=3
.
CMake does not use -lnvc
unless it is compiling a code written in multiple languages. In this case it is passed and the problem above occurs.
Reproducer
If we compile the following code:
int main(int* argv, char** argc) {
double a[100];
#pragma acc parallel loop
for (int i = 0; i<100; ++i) {
a[i] = 2.0;
}
return 0;
}
we can observe the problem:
$ export NV_ACC_NOTIFY=3
$ nvc -acc hello_world.c -o test --diag_suppress set_but_not_used
$ ./test
launch CUDA kernel file=/home/EB030696/test/acc_c/hello_world.c function=main line=4 device=0 threadid=1 num_gangs=2 num_workers=1 vector_length=96 grid=2 block=96
download CUDA data file=/home/EB030696/test/acc_c/hello_world.c function=main line=9 device=0 threadid=1 variable=a[:] bytes=800
$ ldd test
linux-vdso.so.1 => (0x00007ffec7f86000)
/Applications/compilers/gcc/10.3.0/lib64/libstdc++.so.6 (0x00007f819ab03000)
-> libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libacchost.so (0x00007f819a8a1000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libaccdevaux.so (0x00007f819a688000)
-> libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libaccdevice.so (0x00007f819a35e000)
libdl.so.2 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libdl.so.2 (0x00007f819a15a000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libcudadevice.so (0x00007f8199f43000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvomp.so (0x00007f8198f42000)
libpthread.so.0 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libpthread.so.0 (0x00007f8198d26000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvcpumath.so (0x00007f819890e000)
-> libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvc.so (0x00007f81986a9000)
libc.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libc.so.6 (0x00007f81982db000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libgcc_s.so.1 (0x00007f81980c5000)
libm.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libm.so.6 (0x00007f8197dc3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f819aed1000)
$ nvc -acc hello_world.c -o test --diag_suppress set_but_not_used -lnvc
$ ./test
$ ldd test
linux-vdso.so.1 => (0x00007fffa7dab000)
/Applications/compilers/gcc/10.3.0/lib64/libstdc++.so.6 (0x00007f2f0acec000)
-> libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvc.so (0x00007f2f0aa87000)
-> libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libacchost.so (0x00007f2f0a825000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libaccdevaux.so (0x00007f2f0a60c000)
-> libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libaccdevice.so (0x00007f2f0a2e2000)
libdl.so.2 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libdl.so.2 (0x00007f2f0a0de000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libcudadevice.so (0x00007f2f09ec7000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvomp.so (0x00007f2f08ec6000)
libpthread.so.0 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libpthread.so.0 (0x00007f2f08caa000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.3/compilers/lib/libnvcpumath.so (0x00007f2f08892000)
libc.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libc.so.6 (0x00007f2f084c4000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libgcc_s.so.1 (0x00007f2f082ae000)
libm.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.8.5//../../../../lib64/libm.so.6 (0x00007f2f07fac000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2f0b0ba000)
Solution
This issue has been discussed on the Nvidia developer forum: https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225 The opinion of the Nvidia developers is that this is a user error and that one should not link to compiler runtime libraries. When using combining code from multiple libraries I agree with CMake's conclusion that it is usually necessary to link to such libraries. However in this case CMake should also mention the OpenACC libraries explicitly. If they are mentioned explicitly in the correct order in the compiler command then the problem should be fixed.