Skip to content

Redesign the Dispatcher to not need FunctionInterface to convert dynamic types

Here are the performance improvements

OSX + apple clang 7.3.0 + tbb binary size(s)

target master branch
vtkm_cont 1.8M (1873376) 1.8M (1868120)
vtkm_rendering 11M (11909996) 8.4M (8769036)
UnitTests_vtkm_filter_testing 24M (24993148) 19M (19629748)
WorkletTests_TBB 15M (15220992) 12M (12361872)
Clipping_TBB 2.3M (2424148) 1.6M (1664612)

Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb binary size(s)

target master branch
vtkm_cont 6.7M (6970504) 6.4M (6708512)
vtkm_rendering 31M (32032200) 30M (30642792)
UnitTests_vtkm_filter_testing 18M (18672400) 17M (16796624)
WorkletTests_TBB 11M (11461760) 10M (10541256)
Clipping_TBB 1.8M (1835848) 1.5M (1519104)
WorkletTests_CUDA 112M (117102256) 111M (115710000)
Clipping_CUDA 7.4M (7752912) 6.9M (7221040)

As far as binary size reduction these changes have no significant effect with newish GCC but older version of clang now produce a smaller amount of code

OSX + apple clang 7.3.0 + tbb build time (j1)

target master branch
vtkm_cont && vtkm_rendering 245.25s 212.66s
UnitTests_vtkm_filter_testing 449.08s 397.15s
WorkletTests_TBB 318.16s 290.68s
Clipping_TBB 31.21s 25.40s

Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb build time (j4)

target master branch
vtkm_cont && vtkm_rendering 7m23.606 6m4.577s
UnitTests_vtkm_filter_testing 2m12.626 2m4.436s
WorkletTests_TBB 1m38.271 1m29.919s
Clipping_TBB 0m32.570s 0m25.156s
WorkletTests_CUDA 9m20.429 9m7.074s
Clipping_CUDA 1m28.056s 1m15.928s

When we start to look at compile times we start to see a real improvement, with the primary saving being that compiling the core libraries being about 15-20% faster.

Merge request reports