Asynchronous memory allocation for GPUs

Syncronous memory allocation can have a lot of overhead on GPUs. This was noticed when doing tests for multi-block filters. This is supported in cuda versions > 11.3

This is a redo of !2759 (closed). Much has changed since that MR so this will be the replacement.

Edited by Dave Pugmire

Merge request reports

Loading