Issue opening large VTKHDF file
After discussion in the Paraview forum, I think there is an issue opening large HDFVTK files. I'll try to sum up what is happening from the original Paraview forum thread where we discussed the issue with @julien.fausty: https://discourse.paraview.org/t/issue-opening-large-vtkhdf-file/12810
We are developing a CFD Code (SOD2D https://gitlab.com/bsc_sod2d/sod2d_gitlab) and our output results are formatted using VTKHDF format. Till now, we had no problems with it and we are able to generate the output files and read them properly using ParaView. Since we use high-order Lagrangian elements we have two options to save the meshes/results:
- Using high-order lagrange hexahedra (we interpolate the results using equidistant node distribution)
- Linealising the mesh (we “transform” and divide the p-order elements into several first order hexahedra)
Till here no problem, everything working well!
The problem arose a couple of weeks ago when pushing the software and we computed a case for meshes with more than 1 billion nodes. Trying to open the mesh or the results in Paraview gets the following error:
[…]
HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
#000: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5Dio.c line 179 in H5Dread(): can’t read data
major: Dataset
minor: Read failed
#001: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLcallback.c line 2011 in H5VL_dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#002: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLcallback.c line 1978 in H5VL__dataset_read(): dataset read failed
major: Virtual Object Layer
minor: Read failed
#003: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5VLnative_dataset.c line 159 in H5VL__native_dataset_read(): could not get a validated dataspace from file_space_id
major: Invalid arguments to routine
minor: Bad value
#004: /builds/gitlab-kitware-sciviz-ci/build/superbuild/hdf5/src/src/H5S.c line 266 in H5S_get_validated_dataspace(): selection + offset not within extent
major: Dataspace
minor: Out of range
( 202.168s) [pvserver.46 ]vtkHDFReaderImplementat:864 ERR| vtkHDFReader (0x1514c150): Error H5Dread start: 18446744071577530368, 140159467271832, 0 count: 1555968, 354777680, 354776672
( 202.168s) [pvserver.46 ] vtkHDFReader.cxx:440 ERR| vtkHDFReader (0x1514c150): Cannot read the Connectivity array
( 202.168s) [pvserver.46 ] vtkExecutive.cxx:753 ERR| vtkPVCompositeDataPipeline (0x1514fe70): Algorithm vtkFileSeriesReader(0x1514e570) returned failure for request: vtkInformation (0x15227980)
Debug: Off
Modified Time: 163221
Reference Count: 1
Registered Events: (none)
Request: REQUEST_DATA
FROM_OUTPUT_PORT: 0
ALGORITHM_AFTER_FORWARD: 1
FORWARD_DIRECTION: 0
[…]
I have checked the mesh/results files in our code and the values look ok. I don’t know if the issue can be related to int32 / int64 for the vtkIdType
… This mesh goes above the int32 limit and in fact, we had to refactor our code for these cases allowing us to store larger global ids.
Of course, the mesh is partitioned in several ranks (for this particular case 5520 ranks), so the local ids do not arrive at the int32 limit, but maybe when trying to read the HDF5 file affects the variable vtkIdType offset
in HDFReader.cxx. No idea, just a guess…
We have tried two different versions of paraview (5.10.1 & 5.11) getting the same error. We have asked the support of our cluster and they told us that the Paraview versions we have in the cluster are the precompiled versions, so I expect to have the flag compilation VTK_USE_64BIT_IDS
, but cannot be sure…
In summary, our code is able to read and use the mesh file (stored in H5), but Paraview cannot open it giving the error I posted above. For all the meshes that we have done till now there has been no problem, and this error showed when going to this very large mesh, let me show the values:
HDF5 “cube-5520.hdf” {
GROUP “VTKHDF” {
ATTRIBUTE “Type” {
DATATYPE H5T_STRING {
STRSIZE 16;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
}
ATTRIBUTE “Version” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
}
GROUP “CellData” {
DATASET “mpi_rank” {
DATATYPE H5T_STD_U8LE
DATASPACE SIMPLE { ( 1073741824 ) / ( 1073741824 ) }
}
}
DATASET “Connectivity” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 8589934592 ) / ( 8589934592 ) }
}
GROUP “FieldData” {
}
DATASET “NumberOfCells” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “NumberOfConnectivityIds” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “NumberOfPoints” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5520 ) / ( 5520 ) }
}
DATASET “Offsets” {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 1073747344 ) / ( 1073747344 ) }
}
GROUP “PointData” {
}
DATASET “Points” {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 1147407183, 3 ) / ( 1147407183, 3 ) }
}
DATASET “Types” {
DATATYPE H5T_STD_U8LE
DATASPACE SIMPLE { ( 1073741824 ) / ( 1073741824 ) }
}
}
}
We checked the NumberOf*
datasets and they look OK. I have also checked the connectivity array, plotting its values for each 8000 positions and at different 'start points' of the array, and also looks good. (see https://drive.google.com/drive/folders/1sj3SAcNpb7hb-vpqvNT7pgx4YYGJ_l70)
Since we were suspicious the issue came due to the size of the array getting passed through a 32bit integer at some point, I created a tool that allows me to ‘externally’ link the original hdf5 file that was failing and only include some of the ranks (the original mesh was partitioned in 5520 ranks). If I do a ‘new’ mesh including only the first 1300 ranks, so the “Connectivity” array has a size of 2022899200 I can open the mesh in PV. On the other hand, if I include the first 1400 ranks, with a “Connectivity” size of 2178507264 the PV gives the same error detailed before, and the int32 limit is 2147483647 so values just under/above it. Therefore, I think this confirms that the issue is related to a 32-bit integer issue.
With this in mind, we compiled a ParaView 5.11.1 version in our cluster from source files with the flag VTK_USE_64BIT_IDS = ON, so in theory, the vtkIdType
variables must be 64-bit. However, when opening the original file with this compiled ParaView version the error was exactly the same as with the precompiled version.
The mesh causing this issue is large, it weighs 125Gb so is difficult to share it, but if needed, I can try.
Thank you so much and if you need anything from my side please do not hesitate to ask.
BR!
EDIT
To reproduce this issue, run this script generate_unstructured_grid.py.
You will be able to open this generated file in vtk.
Then use the variable nCubePerDim_64Bit
instead of nCubePerDim_32Bit
in the generate_data()
method and rerun the script.
You will have the same issue than above about Connectivities
array.