FindCUDA.cmake: cuda_add_cublas_to_target does not add cublas_device.lib correctly
When using cublas within a kernel (dynamic parallelism), cublas_device.lib must be linked. However, cuda_add_cublas_to_target
only adds this dependency to the host link command, and not to the device link command (nvcc -dlink
). Failure to link dependencies on the device side, however, does not result in link time errors (on windows, linux does appear to fail as expected), but in runtime errors on the first cuda call. See http://stackoverflow.com/questions/39568343/unknown-error-on-first-cudamalloc-if-cublas-is-present-in-kernel
A workaround to this issue is to add it to the nvcc flags list(APPEND CUDA_NVCC_FLAGS -lcublas_device)
.