WIP: Force CUDA timer to record for outside events.
The CUDA timer is reporting that no time passes in some of the device adapter tests. I believe this is because nothing is happening between the start and end events used for the timer. This is a problem because when we are timing algorithms we want to include any work before or after the CUDA calls, which could potentially be serial CPU operations.
To get around this, add empty operations right after the start even and right before the end event. This will force the timer to record the wall time from beginning to end. This will add a few milliseconds to the overall time, but should not be too bad.