vtkGDALVectorReader may have performance issues

Given a 25MB shapefile (input data can be found on google drive, is was just slightly too large to attach here), the goal is to compute the intersection of that data with a bounding box (hard-coded in attached scripts) to produce a subset of the original data. VTK seems to perform this operation very, very slowly.

The vtkGDALVectorReader produces a vtkMultiBlockDataSet, and in the case of the input data linked above, this multiblock has over 150K blocks. The two attached scripts use this reader in different ways to perform the subset operation and render the results.

The testIntersection.py script uses that reader to get its hands on the dataset, then manually iterates over the blocks appending each one using the vtkAppendPolyDataFilter. Then it feeds the output of that filter to the vtkExtractPolyDataGeometry filter. It takes this script approximately 30s to produce the first image on a fairly beefy Linux machine (Ubuntu 16.04, 128 GB RAM, Dual 8 Core Xeon(R) CPU @ 2.60 GHz, NVidia Quadro K220 GPU). After producing the first image, interaction is very sluggish, under a frame per second.

The testIntersectionPipeline.py script uses the above reader and builds a pipeline (vtkGDALVectorReader -> vtkExtractPolyDataGeometry -> vtkCompositePolyDataMapper2). In this case, the first image is produced after about 4 minutes, but thereafter the interaction is much better, near 10 fps.

For comparison, the geopandas project manages to perform the same subset operation on the same input dataset in 10 seconds or so.

Attached are two python scripts which illustrate the issue. Both scripts are instrumented with some timing code in an attempt to isolate the parts consuming the most time.

testIntersection.py

testIntersectionPipeline.py