Skip to content

CUDA: Clang separable compilation

Raul Tambre requested to merge tambre/cmake:cuda_clang_separable into master

For NVCC the compiler takes care of device linking when passed the "-dlink" flag. Clang doesn't support such magic and requires the buildsystem to do the work that NVCC does behind the scenes.

The implementation is based on Bazel's device linking documentation: https://github.com/tensorflow/tensorflow/blob/7cabcdf073abad8c46e9dda62bb8fa4682d2061e/third_party/nccl/build_defs.bzl.tpl#L259

Implements #20726 (closed).

Merge request reports

Loading