Investigate the cuda error while using Kokkos reduce
When using the Kokkos::parallel_reduce
for the implementation of DeviceAdapterAlgorithm<DeviceAdapterTagKokkos>::Reduce
, we frequently get the following run-time error:
cudaFuncGetAttributes( &attr, cuda_parallel_launch_local_memory<DriverType>) error( cudaErrorInvalidDeviceFunction): invalid device function /home/sujin/Projects/kokkos/installs/release/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:495
The exact cause of the error is not known. We do know that this is a Cuda specific issue as we don't see anything similar for any of the other backends, include HIP.
We are currently using a work-around where we use DeviceAdapterAlgorithmGeneral::Reduce
when using the Cuda execution space, but this issue needs to be investigated further.