HDF5 update in VTK breaks ParaView's H5PartReader
@utkarsh.ayachit @shawn.waldon @ben.boeckel
I'm trying to update the PV checkout of VTK in !2218 (merged), but there's a failure in the H5PartReader.
The crash is a symptom, not the disease. The issue is that we're issuing an H5Dread
command that overruns the buffer we supply:
==31080== Invalid write of size 8
==31080== at 0x4C33BB3: memmove (vg_replace_strmem.c:1258)
==31080== by 0x19AC33F8: H5D__contig_readvv_sieve_cb (H5Dcontig.c:784)
==31080== by 0x19D69132: vtkhdf5_H5VM_opvv (H5VM.c:1415)
==31080== by 0x19AC26C5: H5D__contig_readvv (H5Dcontig.c:939)
==31080== by 0x19AE866D: H5D__select_io (H5Dselect.c:210)
==31080== by 0x19AE8963: vtkhdf5_H5D__select_read (H5Dselect.c:276)
==31080== by 0x19AC1EA9: vtkhdf5_H5D__contig_read (H5Dcontig.c:604)
==31080== by 0x19ADE753: vtkhdf5_H5D__read (H5Dio.c:543)
==31080== by 0x19ADEE28: vtkhdf5_H5Dread (H5Dio.c:170)
==31080== by 0x243D04BE: vtkH5PartReader::RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkH5PartReader.cxx:782)
==31080== by 0x1A82B881: vtkPolyDataAlgorithm::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkPolyDataAlgorithm.cxx:90)
==31080== by 0x1A80CB55: vtkExecutive::CallAlgorithm(vtkInformation*, int, vtkInformationVector**, vtkInformationVector*) (vtkExecutive.cxx:775)
==31080== Address 0x5c983de0 is 0 bytes after a block of size 12,000 alloc'd
==31080== at 0x4C2CEDF: malloc (vg_replace_malloc.c:299)
==31080== by 0x1DA07AA9: vtkBuffer<float>::Allocate(long long) (vtkBuffer.h:129)
==31080== by 0x1DA061E9: vtkAOSDataArrayTemplate<float>::AllocateTuples(long long) (vtkAOSDataArrayTemplate.txx:439)
==31080== by 0x1DA0A932: vtkGenericDataArray<vtkAOSDataArrayTemplate<float>, float>::AllocateTuples(long long) (vtkGenericDataArray.h:324)
==31080== by 0x1DA07FB9: vtkGenericDataArray<vtkAOSDataArrayTemplate<float>, float>::Allocate(long long, long long) (vtkGenericDataArray.txx:390)
==31080== by 0x1DA08286: vtkGenericDataArray<vtkAOSDataArrayTemplate<float>, float>::SetNumberOfTuples(long long) (vtkGenericDataArray.txx:481)
==31080== by 0x243D0438: vtkH5PartReader::RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkH5PartReader.cxx:779)
==31080== by 0x1A82B881: vtkPolyDataAlgorithm::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkPolyDataAlgorithm.cxx:90)
==31080== by 0x1A80CB55: vtkExecutive::CallAlgorithm(vtkInformation*, int, vtkInformationVector**, vtkInformationVector*) (vtkExecutive.cxx:775)
==31080== by 0x1A804BF8: vtkDemandDrivenPipeline::ExecuteData(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkDemandDrivenPipeline.cxx:491)
==31080== by 0x1A7FA6EE: vtkCompositeDataPipeline::ExecuteData(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkCompositeDataPipeline.cxx:169)
==31080== by 0x1A8042F8: vtkDemandDrivenPipeline::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*) (vtkDemandDrivenPipeline.cxx:273)
Digging in, we're allocating a vtkFloatArray
with 3000 values to store the 1000 3D coordinates, and using this buffer to read the data into. But looking at the file, these coordinates are stored as doubles (H5T_IEEE_F64LE
):
$ h5dump sample.h5part
...
DATASET "x" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 1000 ) / ( 1000 ) }
DATA {
(0): 53.772, 50.1184, 58.3454, 21.4543, 17.7776, 30.5534, 23.3462,
(7): 60.9427, 40.6856, 9.06256, 1.04324, 8.78282, 10.0275, 8.30659,
(14): 63.9312, 32.8277, 39.2089, 40.8033, 31.5893, 18.7211, 33.7117,
...
Obviously we should not be using a float
array here. We allocate the vtkFloatArray
in this line:
vtkDataArray* temparray = vtkDataArray::CreateDataArray(GetVTKDataType(component_datatype));
The GetVTKDataType
method is defined as such:
int GetVTKDataType(int datatype)
{
if (H5Tequal(datatype, H5T_NATIVE_FLOAT))
{
return VTK_FLOAT;
}
else if (H5Tequal(datatype, H5T_NATIVE_DOUBLE))
{
return VTK_DOUBLE;
}
else if
...
and H5Tequal
is documented to return:
Returns:
Returns a positive value if the datatype identifiers refer to the same datatype.
Returns 0 if the datatype identifiers do not refer to the same datatype.
Returns a negative value when the function fails.
So instead of H5Tequal(...)
we should be using H5Tequal(...) > 0
. After fixing this, sure enough, there's no overrun/crash, and we now see:
ERROR: In paraview/ParaViewCore/VTKExtensions/H5PartReader/vtkH5PartReader.cxx, line 805
vtkH5PartReader (0x249c0b0): An unexpected data type was encountered
The H5Tequal
call was returning -1
, and GetVTKDataType
was thus assuming all types are 32-bit floats.
So! Next steps:
- Do we have anyone who is familiar with this reader/HDF5 and could figure this out quickly? Let's get them looking at this.
- Can we revert the HDF5 update in VTK so we can bump VTK in PV until this is fixed? Or disable the
H5PartReader
if it's not needed anymore?