Skip to content
  • Robert Maynard's avatar
    Perform less unnecessary copies when deducing a worklets parameters. · c1560e2d
    Robert Maynard authored
    One of the causes of the large library size and slow compile times has been
    that vtkm has been creating unnecessary copies when not needed. When the
    objects being copied use shared_ptr this causes a bloom in library size. I
    presume this bloom is caused by the atomic increment/decrement that is
    required by shared_ptr.
    
    For testing I used the following example:
    ```
    struct ExampleFieldWorklet : public vtkm::worklet::WorkletMapField
    {
      typedef void ControlSignature( FieldIn<>, FieldIn<>, FieldIn<>,
                                     FieldOut<>, FieldOut<>, FieldOut<> );
      typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6 );
    
      template<typename T, typename U, typename V>
      VTKM_EXEC_EXPORT
      void operator()( const vtkm::Vec< T, 3 > & vec,
                       const U & scalar1,
                       const V& scalar2,
                       vtkm::Vec<T, 3>& out_vec,
                       U& out_scalar1,
                       V& out_scalar2 ) const
      {
        out_vec = vec * scalar1;
        out_scalar1 = scalar1 + scalar2;
        out_scalar2 = scalar2;
      }
    
      template<typename T, typename U, typename V, typename W, typename X, typename Y>
      VTKM_EXEC_EXPORT
      void operator()( const T & vec,
                       const U & scalar1,
                       const V& scalar2,
                       W& out_vec,
                       X& out_scalar,
                       Y& ) const
      {
      //no-op
      }
    };
    
    int main(int argc, char** argv)
    {
      std::vector< vtkm::Vec<vtkm::Float32, 3> > inputVec;
      std::vector< vtkm::Int32 > inputScalar1;
      std::vector< vtkm::Float64 > inputScalar2;
    
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleV =
        vtkm::cont::make_ArrayHandle(inputVec);
    
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS1 =
        vtkm::cont::make_ArrayHandle(inputVec);
    
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS2 =
        vtkm::cont::make_ArrayHandle(inputVec);
    
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOV;
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS1;
      vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS2;
    
      std::cout << "Making 3 output DynamicArrayHandles " << std::endl;
      vtkm::cont::DynamicArrayHandle out1(handleOV), out2(handleOS1), out3(handleOS2);
    
      typedef vtkm::worklet::DispatcherMapField<ExampleFieldWorklet> DispatcherType;
    
      std::cout << "Invoking ExampleFieldWorklet" << std::endl;
      DispatcherType dispatcher;
    
      dispatcher.Invoke(handleV, handleS1, handleS2, out1, out2, out3);
    
    }
    ```
    
    Original vtkm would generate a binary of size 4684kb and would perform 91
    ArrayHandle copies or assignments. With this branch the binary size is
    reduced to 2392kb and will perform 36 copies or assignments.
    c1560e2d