VTK reader.SetBinaryInputString fails because the length of the binary string exceeds the maximum integer size
Hi all. I'm currently working on improving the speed of the PyVista contour function for structured grid data. As of right now, I'm attempting to contour multiple structured grids (which represent different types of data) using multiprocessing with Python. This process works serially, but takes too long for the intended end users of the application I am developing. I've implemented Pools, a Producer/Consumer using a Queue, and attempted to use different multiprocessing libraries (such as mpire) to see if it was the multiprocessing software that is causing the issue. After thorough inspection, it seems it's on VTK's side rather than the multiprocessing libraries. Any help with this bug would be greatly appreciated. Or if there is something I can change on my side to make things work, that'd be great too. I'm essentially stuck trying to improve the processing speed of the contour function until I can figure this problem out.
Here is the error output I am getting when contouring with multiprocessing (every multiprocessing variation I've tried results in this error).
Contouring cloud data.
with multiprocessing selected
24 workers are available.
Contouring By Type Progress: 0%| | 0/4 [00:00<?, ?it/s]
1.0 18.0 -- Data Range
1006046 -- Length of the active scalar array outputted by the PyVista contour function
memory size: 94639
1.0 31.0
2981674
memory size: 284763
1.0 30.0
5064492
memory size: 486707
1.0 51.0
31213673
memory size: 3009878
Exception in thread Thread-3 (_handle_results):
Traceback (most recent call last):
File "/Users/erose/anaconda3/envs/cloudvis/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/Users/erose/anaconda3/envs/cloudvis/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/Users/erose/anaconda3/envs/cloudvis/lib/python3.11/multiprocessing/pool.py", line 579, in _handle_results
task = get()
^^^^^
File "/Users/erose/anaconda3/envs/cloudvis/lib/python3.11/multiprocessing/connection.py", line 250, in recv
return _ForkingPickler.loads(buf.getbuffer())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erose/anaconda3/envs/cloudvis/lib/python3.11/site-packages/pyvista/core/dataobject.py", line 522, in __setstate__
reader.SetBinaryInputString(vtk_serialized, len(vtk_serialized))
OverflowError: SetBinaryInputString argument 2: value is out of range for int
The code that produces this error I'll attach below. (cubeList is a list of Iris Cubes.)
def poolContour(cubeList):
n_workers = 2 * cpu_count()
print(f"{n_workers} workers are available.")
with Pool(n_workers) as p:
cntrs = p.map(ct.contourDataForPool, tqdm(cubeList,desc="Contouring By Type Progress"))
return cntrs
def contourDataForPool(cubeData):
'''
Returns a PyVista PolyData contoured surface, baseed on the resolution and type of surface being contoured.
Parameters:
- cubeData: an Iris Cube, representing the cloud/terrain data to be contoured
- res: a String representing the resolution of the contour to be produced
- type: a String representing the type of surface being contoured (cloud or terrain)
'''
cntr_z_scale = low_res_z_scale
cntr_z_offset = low_res_z_offset
cntr_label = "low_res_grid"
# if "elevation" in type: TYPE_ISOSURF = ELEV_ISOSURF
cube_cntr = grid_for_scalar_cube_sph(cubeData, z_scale=cntr_z_scale, z_offset=cntr_z_offset,
label=cntr_label, isGeo=True).cell_data_to_point_data(pass_cell_data=True)
TYPE_ISOSURF = ELEV_ISOSURF
# TYPE_ISOSURF = 50
cube_cntr = cube_cntr.contour(isosurfaces=TYPE_ISOSURF,compute_scalars=True)
print(np.nanmin(cube_cntr.active_scalars),np.nanmax(cube_cntr.active_scalars))
print(len(cube_cntr.active_scalars))
print("memory size: " + str(cube_cntr.actual_memory_size))
return cube_cntr
Thanks!