Skip to content

Fix whitespace after CRLF in delimited reader

vtkDelimitedTextReader can be used to read file formats such as CSV and TSV. It ignores empty lines by ignoring CRLF and whitespace in the beginning of a new line. This, however, discards leading whitespace in the first field of a record. Here's an example:

with open('file.csv', 'w') as f:
  f.write('   foo   ,   bar   ,   biz   \r\n')

reader = vtk.vtkDelimitedTextReader()
reader.SetFileName('file.csv')
reader.Update()
table = reader.GetOutput()

print(table.GetValue(0, 0))  # "foo   "
print(table.GetValue(0, 1))  # "   bar   "
print(table.GetValue(0, 2))  # "   biz   "

Notice how the first field is shown as "foo ", when it should be shown as " foo ". When parsing TSVs, the delimiter "\t" is also ignored since it is whitespace, which means an empty first field is ignored entirely.

I propose a fix where extra CRLFs are discarded, but not whitespace. However, I do not know if this breaks parsing for some use case where it might be important to discard lines that contain only whitespace.

Edited by Gabriel Müller

Merge request reports