vtkParallelRenderManager and vtkCocoaRenderWindow bug
In VTK master, there's a hang in vtkParallelRenderManager::StartServices
on Mac arm64 (haven't tested on Mac x86, but I'm assuming it happens there too), that doesn't happen on my linux x86 system. I'm writing a new parallel test which is how I discovered it, but I ran a couple of existing tests (that use vtkCompositeRenderManager
) to see if it's a me problem or an existing problem and the other tests I've run show the same issue.
In test Filters/Parallel/Testing/Cxx/PTextureMapToSphere.cxx
hangs in StartServices
on rank 1, while rank 0 does return from StopServices
.
In tests IO/ADIOS2/Testing/Cxx/TestADIOS2BPReaderMPISingleTimeStep.cxx
and IO/ADIOS2/Testing/Cxx/TestADIOS2BPReaderMPIMltiTimeSteps2D.cxx
, a segfault happens with the following output:
566: Process id 58962 Caught SIGSEGV at 0x0x0 invalid permission for mapped object
566: Program Stack:
566: 0x19addaa24 : _sigtramp [(libsystem_platform.dylib) ???:-1]
566: 0x10197de0c : std::__1::deque<unsigned char, std::__1::allocator<unsigned char>>::front() [(libvtkParallelCore-pv5.12.5.12.dylib) ???:-1]
566: 0x10197fc74 : vtkMultiProcessStream::operator>>(int&) [(libvtkParallelCore-pv5.12.5.12.dylib) ???:-1]
566: 0x100700dd8 : vtkParallelRenderManager::RenderWindowInfo::Restore(vtkMultiProcessStream&) [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x1006ff804 : vtkParallelRenderManager::SatelliteStartRender() [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x1006f8eb8 : vtkParallelRenderManager::GenericStartRenderCallback() [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x1006f67dc : GenericStartRender(vtkObject*, unsigned long, void*, void*) [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x13bce792c : vtkCallbackCommand::Execute(vtkObject*, unsigned long, void*) [(libvtkCommonCore-pv5.12.5.12.dylib) ???:-1]
566: 0x13bed5a2c : vtkSubjectHelper::InvokeEvent(unsigned long, void*, vtkObject*) [(libvtkCommonCore-pv5.12.5.12.dylib) ???:-1]
566: 0x13bed6430 : vtkObject::InvokeEvent(unsigned long, void*) [(libvtkCommonCore-pv5.12.5.12.dylib) ???:-1]
566: 0x105c8bd18 : vtkRenderWindow::Render() [(libvtkRenderingCore-pv5.12.5.12.dylib) ???:-1]
566: 0x10222bcc8 : vtkOpenGLRenderWindow::Render() [(libvtkRenderingOpenGL2-pv5.12.5.12.dylib) ???:-1]
566: 0x102028034 : vtkCocoaRenderWindow::Render() [(libvtkRenderingOpenGL2-pv5.12.5.12.dylib) ???:-1]
566: 0x1006fa2b4 : vtkParallelRenderManager::RenderRMI() [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x1006fbdf4 : RenderRMI(void*, void*, int, int) [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x10196da00 : vtkMultiProcessController::ProcessRMI(int, void*, int, int) [(libvtkParallelCore-pv5.12.5.12.dylib) ???:-1]
566: 0x10196e860 : vtkMultiProcessController::ProcessRMIs(int, int) [(libvtkParallelCore-pv5.12.5.12.dylib) ???:-1]
566: 0x10196e09c : vtkMultiProcessController::ProcessRMIs() [(libvtkParallelCore-pv5.12.5.12.dylib) ???:-1]
566: 0x1006f8808 : vtkParallelRenderManager::StartServices() [(libvtkRenderingParallel-pv5.12.5.12.dylib) ???:-1]
566: 0x10014ac44 : TestADIOS2BPReaderMPISingleTimeStep(vtkMultiProcessController*, void*) [(vtkIOADIOS2CxxTests-MPI) ???:-1]
566: 0x100b46a40 : vtkMPIController::SingleMethodExecute() [(libvtkParallelMPI-pv5.12.5.12.dylib) ???:-1]
566: 0x10014b928 : TestADIOS2BPReaderMPISingleTimeStep(int, char**) [(vtkIOADIOS2CxxTests-MPI) ???:-1]
566: 0x100149aa8 : main [(vtkIOADIOS2CxxTests-MPI) ???:-1]
566: 0x19aa53f28 : start [(dyld) ???:-1]
566: =========================================================
I've tried a few other tests that use vtkCompositeRenderManager
and they all hang. Only the ADIOS tests segfault, but I'm pretty sure it's the same problem causing both.
Doing some digging, the error seems to be in vtkCocoaRenderWindow::Render
on rank 1. Rank 1 loops in vtkMultiProcessController::ProcessRMIs
and should only have a Render
call for each one that rank 0 does. However in the first iteration of the loop on rank 1, vtkCocoaRenderWindow::Render()
ends up calling vtkOpenGLRenderWindow::Render()
twice (the second one happens in the if branch). This ends up ultimately calling vtkParallelRenderManager::SatelliteStartRender
which has a Broadcast
call in it. It works fine while rank 0 is going into vtkParallelRenderManager::StartRender
and Broadcasting the data, but once rank 0 is done, this extra Render call ends up causing a mismatch leading to the hang at the end of the tests (or for the ADIOS tests, a segfault). Commenting out the following code in vtkCocoaRenderWindow::Render
fixes the tests (both segfault and hang).
if (this->WindowCreated && neverRendered && !this->InRender && this->OnScreenInitialized &&
this->Mapped)
{
NSRunLoop* mainRunLoop = [NSRunLoop mainRunLoop];
[mainRunLoop runUntilDate:[NSDate distantPast]];
}
I don't know what the actual correct fix to this is. I'm assuming that removing that code will break other things. Maybe there needs to be an additional condition to check for in the if?