CUDA: Clang separable compilation

Raul Tambre requested to merge tambre/cmake:cuda_clang_separable into master

For NVCC the compiler takes care of device linking when passed the "-dlink" flag. Clang doesn't support such magic and requires the buildsystem to do the work that NVCC does behind the scenes.

The implementation is based on Bazel's device linking documentation:

Implements #20726 (closed).

