Use async termination.
This change updates makes two significant changes to the MPI communication in the flow filters. First: It replaces the particle exchange code. On the send side, things remain unchanged. The sender calls MPI_Isend to send particles to the destination. On the receive side, MPI_Iprobe is now used. When a MPI_Iprobe reports that a message has arrived, it queries the request to get the size, allocates the memory required and then receives the particles.
Second, the termination code has been replaced with the async terminiation detection algorithm published in Morozov et. al, "IExchange: Asynchronous communication and termination detection for iterative algorithms", 2021 IEEE 11th Symposium on Large DAta Analysis and Visualization (LDAV). The old code uses a global counter to determine when all particles have completed. The new code determines when all ranks are out of work and no messages are in flight. Details of the method are below.
The method in the paper above ensures that all local work is completed and no messages remain in flight. The algorithm uses a state-based approach with three states: ranks start in State 0 while performing local work, transition to State 1 when local work is complete, and activate a non-blocking ibarrier to detect when all ranks enter State 1. A dirty flag is initialized to track any new work that arrives, it is set to true if work increases after reaching State 1. In State 2, a non-blocking global reduction (iallreduce) checks if any rank has encountered new work, ranks reset to State 0; otherwise, termination is confirmed. The algorithm guarantees termination safety by using synchronous communication (issend) and message detection (iprobe), ensuring no work or messages are overlooked during transitions.