Skip to content

vtkm::Vec: unrolls arithmetic operators

Vicente Bolea requested to merge vbolea/vtk-m:adding-vtkm-vec into master

This MR simply explicitly unrolls arithmetic operations in vtkm::Vec when having Vector1 operation Vector2 or Vector operation Scalar.

UPDATE

Below are the latest benchmark results of the last commit.

Rationale

As discussed in one of the followup discussion in the pasts VTKm weekly meeting, it is found that array unrolling optimization is given up by some compilers in vtkm::Vec arithmetic operators when multiple of them are chained. This is, v1 + v2 + ... + vN.

Design caveats

Unfortunately, it have required an extensive number of arithmetic functions with all the variations we might find in our codebase.

For that matter, this has been implemented by using code generations with pre-processor macros.

Impact

No difference in size of generated libraries.


library master unroll
libvtkm_filter_gradient-1.5.a 185M 185M
libvtkm_filter_common-1.5.a 194M 194M
libvtkm_filter_extra-1.5.a 216M 216M
libvtkm_cont-1.5.a 238M 238M
libvtkm_filter_contour-1.5.a 259M 259M
libvtkm_rendering-1.5.a 286M 286M
libvtkm_io-1.5.a 29M 29M
libvtkm_worklet-1.5.a 33M 33M
libvtkm_source-1.5.a 8.4M 8.4M

There is a slight performance improvement.

Comparing master.json to unroll.json
Benchmark                                                       Time             CPU      Time Old      Time New       CPU Old       CPU New                                                                        
--------------------------------------------------------------------------------------------------------------------------------------------                                                                        
BenchGradientScalar/manual_time                              -0.0487         -0.0504           997           949           997           947                                                                        
BenchGradientVector/manual_time                              -0.0387         -0.0403          2713          2608          2713          2603                                                                        
BenchGradientVectorRow/manual_time                           -0.0348         -0.0364          2617          2526          2617          2522                                                                        
BenchGradientPoint/manual_time                               -0.1744         -0.1763           867           716           867           714                                                                        
BenchGradientDivergence/manual_time                          -0.0404         -0.0405          2648          2541          2648          2540                                                                        
BenchGradientVorticity/manual_time                           -0.0384         -0.0383          2681          2578          2681          2578                                                                        
BenchGradientQCriterion/manual_time                          -0.0370         -0.0370          2697          2597          2696          2597                                                                        
BenchGradientAll/manual_time                                 -0.1182         -0.1165          1113           981          1112           983                                                                        
BenchThreshold/manual_time                                   -0.0092         -0.0075           584           579           584           580                                                                        
BenchThresholdPoints/CompactPts:0/manual_time                -0.0578         -0.0577           441           416           441           416                                                                        
BenchThresholdPoints/CompactPts:1/manual_time                -0.0474         -0.0474          3898          3713          3897          3713          

Signed-off-by: Vicente Adolfo Bolea Sanchez vicente.bolea@kitware.com

Edited by Vicente Bolea

Merge request reports