Release CUDA resources in filter operations

When running a filter, several fields get moved to or created in the execution environment. In particular, filters commonly will run an operation on every field to transform it from input to output. This can quickly fill up the memory on CUDA devices. We need to manage memory better here. At the very least we should remove input field arrays that are transferred. We may also want to first check if a field already exists on a CUDA device before doing the transform there and instead do it somewhere else instead. (The ArrayHandleCopy code has something that behaves this way.)