Support deferred freeing for CUDA memory
Calls to 'cudaFree' block execution on all cuda devices. Reduce the number of times this happens by having a deferred free mechanism that frees a pool of pointers together when a threshold is reached.
Especially helpful during virtual object transfers that requires a few small allocations and frees.
Based on @robertmaynard's code in !1155 (closed).
Edited by Sujin Philip