Particle Advection performance limited by vcvtsi2ss
The dominant contribution to the runtime of the particle advection filter is now calls to vtkm::exec::CellLocatorUniformGrid::FindCell
. The assembly reveals that this function is itself dominated by calls to vcvts22ss
:
Admittedly, I don't know if there's much that can be done about this, and perhaps it indicates that I'm at a point of diminishing returns for optimizing RK4Integrator
. But previously, I've managed to (for example) use floating point counters and various other workarounds to increase the speed of this sort of operation.
Any ideas are welcome; I believe this could have broader impact throughout the library. Otherwise feel free to close.