`ReadPortal().Get(Idx)` is an antipattern
While looking at the cause of some performance problems, I found that calling ReadPortal().Get(idx)
causes a significant slowdown:
(This is taken from running TestingDeviceAdapter.h
under perf
, compiled using RelWithDebInfo
.)
The assembly has a hotspot in a string allocator from within ReadPortal()
, which I found somewhat surprising:
Making the following change:
std::cout << "Checking results." << std::endl;
+ auto portal = handle.ReadPortal();
for (vtkm::Id index = 0; index < 1; index++)
{
- vtkm::Id value = handle.ReadPortal().Get(index);
+ vtkm::Id value = portal.Get(index);
VTKM_TEST_ASSERT(value == index + OFFSET,
"Got bad value for single value scheduled kernel.");
}
more than doubles the execution speed of the test, and the call to ReadPortal()
nearly disappears from the flamegraph:
Is there any way we can get better assembly generated here? I note that it seems like ReadPortal()
might be a copy by design, from the following comment:
temp.ReadPortal(); // Forces copy back to control.
A fix in internal usage is forthcoming in an MR, but if if anyone has an idea about how to generate better asm generated it would be easier, and in addition code like this:
vtkm::UInt8 GetCellShape(vtkm::Id id) const override
{
return this->FullCellSet.GetCellShape(this->ValidCellIds.ReadPortal().Get(id));
}
would not be a performance bug.