Redesign the Dispatcher to not need FunctionInterface to convert dynamic types
Here are the performance improvements
OSX + apple clang 7.3.0 + tbb binary size(s)
target | master | branch |
---|---|---|
vtkm_cont | 1.8M (1873376) | 1.8M (1868120) |
vtkm_rendering | 11M (11909996) | 8.4M (8769036) |
UnitTests_vtkm_filter_testing | 24M (24993148) | 19M (19629748) |
WorkletTests_TBB | 15M (15220992) | 12M (12361872) |
Clipping_TBB | 2.3M (2424148) | 1.6M (1664612) |
Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb binary size(s)
target | master | branch |
---|---|---|
vtkm_cont | 6.7M (6970504) | 6.4M (6708512) |
vtkm_rendering | 31M (32032200) | 30M (30642792) |
UnitTests_vtkm_filter_testing | 18M (18672400) | 17M (16796624) |
WorkletTests_TBB | 11M (11461760) | 10M (10541256) |
Clipping_TBB | 1.8M (1835848) | 1.5M (1519104) |
WorkletTests_CUDA | 112M (117102256) | 111M (115710000) |
Clipping_CUDA | 7.4M (7752912) | 6.9M (7221040) |
As far as binary size reduction these changes have no significant effect with newish GCC but older version of clang now produce a smaller amount of code
OSX + apple clang 7.3.0 + tbb build time (j1)
target | master | branch |
---|---|---|
vtkm_cont && vtkm_rendering | 245.25s | 212.66s |
UnitTests_vtkm_filter_testing | 449.08s | 397.15s |
WorkletTests_TBB | 318.16s | 290.68s |
Clipping_TBB | 31.21s | 25.40s |
Ubuntu 14.04 + GCC 6.3 + CUDA 9 + tbb build time (j4)
target | master | branch |
---|---|---|
vtkm_cont && vtkm_rendering | 7m23.606 | 6m4.577s |
UnitTests_vtkm_filter_testing | 2m12.626 | 2m4.436s |
WorkletTests_TBB | 1m38.271 | 1m29.919s |
Clipping_TBB | 0m32.570s | 0m25.156s |
WorkletTests_CUDA | 9m20.429 | 9m7.074s |
Clipping_CUDA | 1m28.056s | 1m15.928s |
When we start to look at compile times we start to see a real improvement, with the primary saving being that compiling the core libraries being about 15-20% faster.