Redesign the Dispatcher to not need FunctionInterface to convert dynamic types (!1010) · Merge requests · VTK / VTK-m

Robert Maynard requested to merge robertmaynard/vtk-m:dispatcher_base_leverage_new_cast_and_call into master Nov 22, 2017

Here are the performance improvements

OSX + apple clang 7.3.0 + tbb binary size(s)

target	master	branch
vtkm_cont	1.8M (1873376)	1.8M (1868120)
vtkm_rendering	11M (11909996)	8.4M (8769036)
UnitTests_vtkm_filter_testing	24M (24993148)	19M (19629748)
WorkletTests_TBB	15M (15220992)	12M (12361872)
Clipping_TBB	2.3M (2424148)	1.6M (1664612)

Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb binary size(s)

target	master	branch
vtkm_cont	6.7M (6970504)	6.4M (6708512)
vtkm_rendering	31M (32032200)	30M (30642792)
UnitTests_vtkm_filter_testing	18M (18672400)	17M (16796624)
WorkletTests_TBB	11M (11461760)	10M (10541256)
Clipping_TBB	1.8M (1835848)	1.5M (1519104)
WorkletTests_CUDA	112M (117102256)	111M (115710000)
Clipping_CUDA	7.4M (7752912)	6.9M (7221040)

As far as binary size reduction these changes have no significant effect with newish GCC but older version of clang now produce a smaller amount of code

OSX + apple clang 7.3.0 + tbb build time (j1)

target	master	branch
vtkm_cont && vtkm_rendering	245.25s	212.66s
UnitTests_vtkm_filter_testing	449.08s	397.15s
WorkletTests_TBB	318.16s	290.68s
Clipping_TBB	31.21s	25.40s

Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb build time (j4)

target	master	branch
vtkm_cont && vtkm_rendering	7m23.606	6m4.577s
UnitTests_vtkm_filter_testing	2m12.626	2m4.436s
WorkletTests_TBB	1m38.271	1m29.919s
Clipping_TBB	0m32.570s	0m25.156s
WorkletTests_CUDA	9m20.429	9m7.074s
Clipping_CUDA	1m28.056s	1m15.928s

When we start to look at compile times we start to see a real improvement, with the primary saving being that compiling the core libraries being about 15-20% faster.

Redesign the Dispatcher to not need FunctionInterface to convert dynamic types

Merge request reports