Limit vtkDelimitedTextReader memory consumption
vtkDelimitedTextReader memory footprint is now much lower during computation.
(i.e. inside the RequestData, called from Update)
It used to read all the data as string array and then have a post-process pass to create numerical (int or double) arrays. Thus, at some point the data was duplicated as string and numeric.
Instead, check on-the-fly if conversion is possible, to minimize the string allocation (and duplication)
Kind of benchmark
Loading a CSV file of 100k lines, with 9 numerical columns (3 int and 9 double) with this python script
import argparse
import vtk
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--filepath')
args = parser.parse_args()
filepath = args.filepath
reader = vtk.vtkDelimitedTextReader()
reader.SetFileName(filepath)
reader.SetHaveHeaders(True)
reader.DetectNumericColumnsOn()
reader.Update()
Running through heaptrack gives the following informations:
| version | peak memory | peak contribution | calls to allocation | duration |
|---|---|---|---|---|
| master | 58.5 MB | 41.9 MB from vtkStringArray
|
1,042,111 | 0.873 s |
| this topic | 17.9 MB | 6.3 MB from vtkBuffer
|
39,784 | 0.344 s |
Edited by Nicolas Vuaille