Improve Abort Functionality
Summary: Our goal is to redesign how VTK algorithms are interrupted (aborted) in a way that is friendly to multi-threaded execution. Currently, the abort mechanism is designed to function only within a single thread when the abort flag is set by an observer of the progress event. This severely limits the usefulness of the functionality as evidenced by the fact that it is rarely used (which must be the case as it has many bugs that no one reports on). Our design enables aborting algorithms from a separate thread.
The changes described below were conceptualized by Berk Geveci and implemented by Stephen Crowell. Since this is a large change, we are looking for input on whether or not this change is well thought out and any possible issues that could come up.
Requirements:
- Aborting an algorithm should cause the currently executing algorithm and downstream to short circuit their execution.
- When running an update after an abort, execution should restart at the filter that was executing when abort happened.
- Setting abort from another thread is supported.
- Pipelines that contain MPI enabled filters should not deadlock when aborted / restarted.
High Level Design:
- The AbortExecute flag has been changed to atomic.
- Changing the flag to atomic allows for AbortExecute to be set from another thread without causing any issues.
- Algorithms call CheckAbort during execution. CheckAbort checks the current algorithm and upstream algorithms for a set AbortExecute flag.
- This is the first step for aborting an algorithm. In the previous method, if an upstream algorithm was aborted, the currently executing algorithm has no way of knowing and will continue to execute as normal. Checking upstream allows for the current algorithm to know that it needs to abort. This is necessary for multi-threaded execution as the filter that is set to abort may already be done by the time abort is set.
- Aborted algorithms exit early and return empty data.
- Returning early allows for a short circuit during execution. Returning empty data helps expedite future filter’s execution time. The reason the pipeline execution is not stopped is to avoid deadlock when pipelines contain parallel (MPI) algorithms that perform communication. All filters need to function properly with empty data.
- Pipeline continues executing with empty data and a new ABORTED flag is passed downstream.
- Passing the ABORTED flag downstream allows other filters to know they should abort as well.
- When an update is run, filters with an ABORTED flag will need to re-execute.
- Filters with the ABORTED flag exist in two cases: the filter was running when an AbortExecute flag or it is downstream of the filter mentioned in the previous case. As a result, these filters return empty data and will need to execute.
Details:
-
Modifications to existing filters:
- The only modifications to filters is to add the call to CheckAbort and adding the short circuit logic. Each filter will need to place the CheckAbort call in a place that will be executed frequently (but not too frequently). The short circuit logic varies from filter to filter, but the core concept is to stop any data processing loops from running when the AbortExecute flag is set. We choose not to return early from a filter since some filters deallocate memory or perform other cleanup steps before returning. The prototyping section has examples of changing a filter to call CheckAbort and short circuit.
-
Optimization with MTime:
- Since we are calling CheckAbort often, it is a bad idea to have the function travel upstream during each call. As a result, we make use of two MTime timestamps: LastAbortTime and LastAbortCheckTime. LastAbortTime is a global timestamp updated when an AbortExecuteFlag is set and LastAbortCheckTime is a local timestamp updated when a filter travels upstream to check for a set AbortExecute flag. Before traveling upstream, CheckAbort will compare the two timestamps. If the LastAbortTime is more recent than LastAbortCheckTime, then traverse upstream.
-
MPI algorithms:
- MPI algorithms are more complicated than most filters when it comes to aborting. We must be careful when adding the CheckAbort call since we want each process to perform the same number of calls and we cannot have one process block because of CheckAbort. As a result, CheckAbort should not be called when MPI communication is occurring and should not exit an algorithm early to prevent possible blocking. MPI algorithms will have to barrier at the beginning and the end of execution to make sure that the algorithm is aborted across all ranks and re-executes next time. If some of the ranks are not aborted and do not execute the next time, deadlocks can occur.
Prototype:
An initial implementation of the design described above can be found here. This implementation modifies vtkAlgorithm to include CheckAbort and its associated helper functions. vtkAlgorithm also has a new function to set AbortExecute on and update LastAbortTime. vtkDemandDrivenPipeline has been modified to pass the ABORTED flag and tell the algorithm to re-execute based on the ABORTED flag. vtkRTAnalyticSource, vtkShrinkFilter, vtkContourGrid, and vtkClipDataSet filters have also been modified to include the CheckAbort call and the short circuit logic..
A testable example can be found here. The example has two threads. ThreadA sets up a four filter pipeline with the filter mentioned previously and calls update twice. After each update, the AbortExecute flag and ABORTED flag are printed for each filter. ThreadB waits seven minutes and then tells the wavelet filter to abort. This results in the currently executing filter to check for an abort and then end execution early. Then when the second update is called, the next update will begin execution at the filter that was running when the abort occurred during the previous update.