Skip to content

Don't require CUDA_LAUNCH_BLOCKING for Kokkos cuda backend

  1. The code now works without CUDA_LAUNCH_BLOCKING set by using explicit synchronizations where required.
  2. The code has also been modified to use thread specific memory spaces, which for Kokkos' Cuda backend means per thread streams.

Merge request reports