Synchronize the CUDA timer on both the start and end events
Previously, the timer for CUDA devices only called cudaEventSynchronize at the end event when asking for the elapsed time. This, however, could allow time to pass from when the timer was reset to when the start event happened that was not recorded in the timer. This added synchronization should make sure that all time spent in CUDA is recorded.