Unnecessary register preservation due to failure to inline ArrayPortalBasicRead::Get dominating cost of intersection logic.
When compiling the VTK-m master with
$ cmake ../ -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -march=native -fno-omit-frame-pointer -Wfatal-errors -ffast-math -fno-finite-math-only -O3 -g" -DVTKm_ENABLE_EXAMPLES=ON -DVTKm_ENABLE_OPENMP=ON -DVTKm_ENABLE_TESTING=OFF -G Ninja
and running the ./examples/demo/Demo
under perf
, I see that the dominant cost seems to be register-memory moves that would be unnecessary if ArrayPortalBasicRead::Get
was inlined, and hence the registers would not need to be preserved:
The part I expect to be expensive (the memory read + the FMAs) is in fact not that expensive relative to preserving the registers over the function call boundary:
Obviously not a high priority, but kinda a fun thing to look into.