Integrate variable filter needs performance improvements
The integrate variables filter needs performance improvements. I believe this can be replicated as follows:
- Linux, builtin server, 5.5.2.
- sources/ Unstructured Cell Types/ Tet/ Block Dimensions 100 100 100, Apply.
- IntegrateVariables filter. Apply.
A million cells should not take this long to process. I would hope and expect it should be around or less than the time to create the unstructured cell type in the first place. Here is a Timer Log:
PropertiesPanel::Apply, 159.472 seconds
RenderView::Update, 10.0236 seconds
vtkPVView::Update, 10.0233 seconds
Execute vtkCellTypeSource id: 8000, 2.0739 seconds
Execute vtkGeometryRepresentationWithFaces id: 8315, 7.90282 seconds
UnstructuredCellTypes::GatherInformation, 0.110369 seconds
vtkPVView::Update, 149.183 seconds
Execute vtkIntegrateAttributes id: 8372, 149.181 seconds
<Snip>
Still Render, 0.036104 seconds
Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
OpenGL Dev Render, 0.032254 seconds
Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0
Here is what I found: In file VTK/Filters/Parallel/vtkIntegrateAttributes,
In function vtkIntegrateAttributes::ExecuteBloc(), there is a for loop, running through all cells in the dataset. Everything below here should be performance code.
for (cellId = 0; cellId < numCells; ++cellId)
{
....
this->IntegrateTetrahedron(); // line 264.
....
}
vtkIntegrateAttributes::IntegrateTetrahedron() calls
this->IntegrateData1() and then this->IntegrateData4().
Again, this happens per cell.
vtkIntegrateAttributes::IntegrateData1() and vtkIntegrateAttributes::IntegrateData4() have the following:
// GetNumberOfFields calls Prune, which then leads to garbage collection -
// PER CELL.
// This is a major slowdown.
numArrays = fieldList.GetNumberOfFields();
// I assume for every variable? Now, we are every variable of every cell.
for (i = 0; i < numArrays; ++i)
{
// GetFieldIndex is slow. I didn't dig into why. Please look, and speed it up.
if (fieldList.GetFieldIndex(i) < 0)
continue;
// Comment in code. We could template for speed.
// NO - isn't it possible to pick up this index once?
inArray = inda->GetArray(fieldList.GetDSAIndex(index, i));
// GetFieldIndex is being found a second time! Get rid of this second call,
// use results from first call. Also, isn't it possible to get this array once?
outArray = outda->GetArray(fieldList.GetFieldIndex(i));
numComponents = inArray->GetNumberOfComponents();
// Everything from here on down is surprisingly noise.
for (j = 0; j < numComponents; ++j)
{
vIn1 = inArray->GetComponent(pt1Id, j);
vOut = outArray->GetComponent(0, j);
dv = vIn1;
vOut += dv * k;
outArray->SetComponent(0, j, vOut);
}
}
I am attaching the performance profile that I acquired from vtune. This shows GetNumberOfFields expanded.
Here is GetFieldIndex expanded:
And here is GetIDSIndex expanded:
Edited by W. Alan Scott