Limit vtkDelimitedTextReader memory consumption

vtkDelimitedTextReader memory footprint is now much lower during computation. (i.e. inside the RequestData, called from Update)

It used to read all the data as string array and then have a post-process pass to create numerical (int or double) arrays. Thus, at some point the data was duplicated as string and numeric.

Instead, check on-the-fly if conversion is possible, to minimize the string allocation (and duplication)

Kind of benchmark

Loading a CSV file of 100k lines, with 9 numerical columns (3 int and 9 double) with this python script

import argparse
import vtk

parser = argparse.ArgumentParser()
parser.add_argument('-f', '--filepath')
args = parser.parse_args()
filepath = args.filepath

reader = vtk.vtkDelimitedTextReader()
reader.SetFileName(filepath)
reader.SetHaveHeaders(True)
reader.DetectNumericColumnsOn()

reader.Update()

Running through heaptrack gives the following informations:

version peak memory peak contribution calls to allocation duration
master 58.5 MB 41.9 MB from vtkStringArray 1,042,111 0.873 s
this topic 17.9 MB 6.3 MB from vtkBuffer 39,784 0.344 s
Edited by Nicolas Vuaille

Merge request reports

Loading