Combining XLF with FindCUDAToolkit + CUDA-Fortran

This is perhaps a bit vague without a minimum working reproducer, but I'm wondering if there are examples/documentation somewhere for successful CUDA-Fortran CMake builds with this kind of complexity:

IBM XL toolchain (XLF, XLC) with mixed C/C++/Fortran code on Power9 architecture
Use FindCUDAToolkit to pull in the appropriate CMake targets (like CUDA::cudart for linking)
use -qcuda to enable CUDA on the XLF side
generator is Ninja

I'm finding it really hard to get the build and especially linking (device linking in particular) to run smoothly. Is there a way to "force" device linking in a canonical CMake way on top of the host linking? It seems like CMake is perfectly happy to build without the device link at the moment.

Any hints on internal variables I might need to mess with here? Do I need enable_language(CUDA) even though I'm not directly invoking nvcc and using CUDA-Fortran? It hasn't had much effect so far.

I can get somewhat better behavior by using the xlcuf frontend for linking vs. xlf2008_r or the xlc++, but can never really get around issues like "cudaGetSymbolAddress failed with error code 13: invalid device symbol produced by __xlcuf_init() when mixing C main() with CUDA-Fortran calls for example.

Perhaps this should work just fine and some docs are just needed, or perhaps there are some manual shims you might suggest?

To make matters more complex, the online docs for XLF seem to suggest that host and device Fortran constructs are not genuinely separated until the IR is produced, which is forcing me to only apply i.e., -qcuda at set_source_files_properties level because otherwise the builds will fail with Fortran construct issues.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information