Tags give the ability to mark specific points in history as being important
  • v1.4.0   Release VTK-m 1.4.0 1.4.0 is our fifth official release of VTK-m. The major changes to VTK-m from 1.3.0 can be found in: docs/changelog/1.4/release-notes.md
    903c2604 · Release VTK-m 1.4.0 ·

    VTK-m 1.4 Release Notes

    Table of Contents

    1. Core
      • Remove templates from ControlSignature field tags
      • Worklets can now be specialized for a specific device adapter
      • Worklets now support an execution mask
      • Redesign VTK-m Runtime Device Tracking
      • vtkm::cont::Initialize added to make setting up VTK-m runtime state easier
    2. ArrayHandle
      • Add vtkm::cont::ArrayHandleVirtual
      • vtkm::cont::ArrayHandleZip provides a consistent API even with non-writable handles
      • vtkm::cont::VariantArrayHandle replaces vtkm::cont::DynamicArrayHandle
      • vtkm::cont::VariantArrayHandle CastAndCall supports casting to concrete types
      • vtkm::cont::VariantArrayHandle::AsVirtual<T>() performs casting
      • StorageBasic::StealArray() now provides delete function to new owner
    3. Control Environment
      • vtkm::cont::CellLocatorGeneral has been added
      • vtkm::cont::CellLocatorTwoLevelUniformGrid has been renamed to vtkm::cont::CellLocatorUniformBins
      • vtkm::cont::Timer now supports asynchronous and device independent timers
      • vtkm::cont::DeviceAdapterId construction from strings are now case-insensitive
      • vtkm::cont::Initialize will only parse known arguments
    4. Execution Environment
      • VTK-m logs details about each CUDA kernel launch
      • VTK-m CUDA allocations can have managed memory (cudaMallocManaged) enabled/disabled from C++
      • VTK-m CUDA kernel scheduling improved including better defaults, and user customization support
      • VTK-m Reduction algorithm now supports differing input and output types
      • Added specialized operators for ArrayPortalValueReference
    5. Worklets and Filters
      • vtkm::worklet::Invoker now supports worklets which require a Scatter object
      • BitFields are now a support field input/out type for VTK-m worklets
      • Added a Point Merging worklet
      • vtkm::filter::CleanGrid now can do point merging
      • Added a connected component worklets and filters
    6. Build
      • CMake 3.8+ now required to build VTK-m
      • VTK-m now can verify that it installs itself correctly
      • VTK-m now requires CUDA separable compilation to build
      • VTK-m provides a vtkm_filter CMake target
      • vtkm::cont::CellLocatorBoundingIntervalHierarchy is compiled into vtkm_cont
    7. Other
      • LodePNG added as a thirdparty package
      • Optionparser added as a thirdparty package
      • Thirdparty diy now can coexist with external diy
      • Merge benchmark executables into a device dependent shared library
      • Merge rendering testing executables to a shared library
      • Merge worklet testing executables into a device dependent shared library
      • VTK-m runtime device detection properly handles busy CUDA devices

    Core

    Remove templates from ControlSignature field tags

    Previously, several of the ControlSignature tags had a template to specify a type list. This was to specify potential valid value types for an input array. The importance of this typelist was to limit the number of code paths created when resolving a vtkm::cont::VariantArrayHandle (formerly a DynamicArrayHandle). This (potentially) reduced the compile time, the size of libraries/executables, and errors from unexpected types.

    Much has changed since this feature was originally implemented. Since then, the filter infrastructure has been created, and it is through this that most dynamic worklet invocations happen. However, since the filter infrastrcture does its own type resolution (and has its own policies) the type arguments in ControlSignature are now of little value.

    Script to update code

    This update requires changes to just about all code implementing a VTK-m worklet. To facilitate the update of this code to these new changes (not to mention all the code in VTK-m) a script is provided to automatically remove these template parameters from VTK-m code.

    This script is at Utilities/Scripts/update-control-signature-tags.sh. It needs to be run in a Unix-compatible shell. It takes a single argument, which is a top level directory to modify files. The script processes all C++ source files recursively from that directory.

    Selecting data types for auxiliary filter fields

    The main rational for making these changes is that the types of the inputs to worklets is almost always already determined by the calling filter. However, although it is straightforward to specify the type of the "main" (active) scalars in a filter, it is less clear what to do for additional fields if a filter needs a second or third field.

    Typically, in the case of a second or third field, it is up to the DoExecute method in the filter implementation to apply a policy to that field. When applying a policy, you give it a policy object (nominally passed by the user) and a traits of the filter. Generally, the accepted list of types for a field should be part of the filter's traits. For example, consider the WarpVector filter. This filter only works on Vecs of size 3, so its traits class looks like this.

    template <>
    class FilterTraits<WarpVector>
    {
    public:
      // WarpVector can only applies to Float and Double Vec3 arrays
      using InputFieldTypeList = vtkm::TypeListTagFieldVec3;
    };

    However, the WarpVector filter also requires two fields instead of one. The first (active) field is handled by its superclass (FilterField), but the second (auxiliary) field must be managed in the DoExecute. Generally, this can be done by simply applying the policy with the filter traits.

    The corner cases

    Most of the calls to worklets happen within filter implementations, which have their own way of narrowing down potential types (as previously described). The majority of the remainder either use static types or work with a variety of types.

    However, there is a minority of corner cases that require a reduction of types. Since the type argument of the worklet ControlSignature arguments are no longer available, the narrowing of types must be done before the call to Invoke.

    This narrowing of arguments is not particularly difficult. Such type-unsure arguments usually come from a VariantArrayHandle (or something that uses one). You can select the types from a VariantArrayHandle simply by using the ResetTypes method. For example, say you know that a variant array is supposed to be a scalar.

    dispatcher.Invoke(variantArray.ResetTypes(vtkm::TypeListTagFieldScalar()),
                      staticArray);

    Even more common is to have a vtkm::cont::Field object. A Field object internally holds a VariantArrayHandle, which is accessible via the GetData method.

    dispatcher.Invoke(field.GetData().ResetTypes(vtkm::TypeListTagFieldScalar()),
                      staticArray);

    Change in executable size

    The whole intention of these template parameters in the first place was to reduce the number of code paths compiled. The hypothesis of this change was that in the current structure the code paths were not being reduced much if at all. If that is true, the size of executables and libraries should not change.

    Here is a recording of the library and executable sizes before this change (using ds -h).

    3.0M    libvtkm_cont-1.2.1.dylib
    6.2M    libvtkm_rendering-1.2.1.dylib
    312K    Rendering_SERIAL
    312K    Rendering_TBB
     22M    Worklets_SERIAL
     23M    Worklets_TBB
     22M    UnitTests_vtkm_filter_testing
    5.7M    UnitTests_vtkm_cont_serial_testing
    6.0M    UnitTests_vtkm_cont_tbb_testing
    7.1M    UnitTests_vtkm_cont_testing

    After the changes, the executable sizes are as follows.

    3.0M    libvtkm_cont-1.2.1.dylib
    6.0M    libvtkm_rendering-1.2.1.dylib
    312K    Rendering_SERIAL
    312K    Rendering_TBB
     21M    Worklets_SERIAL
     21M    Worklets_TBB
     22M    UnitTests_vtkm_filter_testing
    5.6M    UnitTests_vtkm_cont_serial_testing
    6.0M    UnitTests_vtkm_cont_tbb_testing
    7.1M    UnitTests_vtkm_cont_testing

    As we can see, the built sizes have not changed significantly. (If anything, the build is a little smaller.)

    Worklets can now be specialized for a specific device adapter

    This change adds an execution signature tag named Device that passes a DeviceAdapterTag to the worklet's parenthesis operator. This allows the worklet to specialize its operation. This features is available in all worklets.

    The following example shows a worklet that specializes itself for the CUDA device.

    struct DeviceSpecificWorklet : vtkm::worklet::WorkletMapField
    {
      using ControlSignature = void(FieldIn, FieldOut);
      using ExecutionSignature = _2(_1, Device);
    
      // Specialization for the Cuda device.
      template <typename T>
      T operator()(T x, vtkm::cont::DeviceAdapterTagCuda) const
      {
        // Special cuda implementation
      }
    
      // General implementation
      template <typename T, typename Device>
      T operator()(T x, Device) const
      {
        // General implementation
      }
    };

    Effect on compile time and binary size

    This change necessitated adding a template parameter for the device that followed at least from the schedule all the way down. This has the potential for duplicating several of the support methods (like DoWorkletInvokeFunctor) that would otherwise have the same type. This is especially true between the devices that run on the CPU as they should all be sharing the same portals from ArrayHandles. So the question is whether it causes compile to take longer or cause a significant increase in binaries.

    To informally test, I first ran a clean debug compile on my Windows machine with the serial and tbb devices. The build itself took 3 minutes, 50 seconds. Here is a list of the binary sizes in the bin directory:

    kmorel2 0> du -sh *.exe *.dll
    200K    BenchmarkArrayTransfer_SERIAL.exe
    204K    BenchmarkArrayTransfer_TBB.exe
    424K    BenchmarkAtomicArray_SERIAL.exe
    424K    BenchmarkAtomicArray_TBB.exe
    440K    BenchmarkCopySpeeds_SERIAL.exe
    580K    BenchmarkCopySpeeds_TBB.exe
    4.1M    BenchmarkDeviceAdapter_SERIAL.exe
    5.3M    BenchmarkDeviceAdapter_TBB.exe
    7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
    7.9M    BenchmarkFieldAlgorithms_TBB.exe
    22M     BenchmarkFilters_SERIAL.exe
    22M     BenchmarkFilters_TBB.exe
    276K    BenchmarkRayTracing_SERIAL.exe
    276K    BenchmarkRayTracing_TBB.exe
    4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
    4.4M    BenchmarkTopologyAlgorithms_TBB.exe
    712K    Rendering_SERIAL.exe
    712K    Rendering_TBB.exe
    708K    UnitTests_vtkm_cont_arg_testing.exe
    1.7M    UnitTests_vtkm_cont_internal_testing.exe
    13M     UnitTests_vtkm_cont_serial_testing.exe
    14M     UnitTests_vtkm_cont_tbb_testing.exe
    18M     UnitTests_vtkm_cont_testing.exe
    13M     UnitTests_vtkm_cont_testing_mpi.exe
    736K    UnitTests_vtkm_exec_arg_testing.exe
    136K    UnitTests_vtkm_exec_internal_testing.exe
    196K    UnitTests_vtkm_exec_serial_internal_testing.exe
    196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
    2.0M    UnitTests_vtkm_exec_testing.exe
    83M     UnitTests_vtkm_filter_testing.exe
    476K    UnitTests_vtkm_internal_testing.exe
    148K    UnitTests_vtkm_interop_internal_testing.exe
    1.3M    UnitTests_vtkm_interop_testing.exe
    2.9M    UnitTests_vtkm_io_reader_testing.exe
    548K    UnitTests_vtkm_io_writer_testing.exe
    792K    UnitTests_vtkm_rendering_testing.exe
    3.7M    UnitTests_vtkm_testing.exe
    320K    UnitTests_vtkm_worklet_internal_testing.exe
    65M     UnitTests_vtkm_worklet_testing.exe
    11M     vtkm_cont-1.3.dll
    2.1M    vtkm_interop-1.3.dll
    21M     vtkm_rendering-1.3.dll
    3.9M    vtkm_worklet-1.3.dll

    After making the singular change to the Invocation object to add the DeviceAdapterTag as a template parameter (which should cause any extra compile instances) the compile took 4 minuts and 5 seconds. Here is the new list of binaries.

    kmorel2 0> du -sh *.exe *.dll
    200K    BenchmarkArrayTransfer_SERIAL.exe
    204K    BenchmarkArrayTransfer_TBB.exe
    424K    BenchmarkAtomicArray_SERIAL.exe
    424K    BenchmarkAtomicArray_TBB.exe
    440K    BenchmarkCopySpeeds_SERIAL.exe
    580K    BenchmarkCopySpeeds_TBB.exe
    4.1M    BenchmarkDeviceAdapter_SERIAL.exe
    5.3M    BenchmarkDeviceAdapter_TBB.exe
    7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
    7.9M    BenchmarkFieldAlgorithms_TBB.exe
    22M     BenchmarkFilters_SERIAL.exe
    22M     BenchmarkFilters_TBB.exe
    276K    BenchmarkRayTracing_SERIAL.exe
    276K    BenchmarkRayTracing_TBB.exe
    4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
    4.4M    BenchmarkTopologyAlgorithms_TBB.exe
    712K    Rendering_SERIAL.exe
    712K    Rendering_TBB.exe
    708K    UnitTests_vtkm_cont_arg_testing.exe
    1.7M    UnitTests_vtkm_cont_internal_testing.exe
    13M     UnitTests_vtkm_cont_serial_testing.exe
    14M     UnitTests_vtkm_cont_tbb_testing.exe
    19M     UnitTests_vtkm_cont_testing.exe
    13M     UnitTests_vtkm_cont_testing_mpi.exe
    736K    UnitTests_vtkm_exec_arg_testing.exe
    136K    UnitTests_vtkm_exec_internal_testing.exe
    196K    UnitTests_vtkm_exec_serial_internal_testing.exe
    196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
    2.0M    UnitTests_vtkm_exec_testing.exe
    86M     UnitTests_vtkm_filter_testing.exe
    476K    UnitTests_vtkm_internal_testing.exe
    148K    UnitTests_vtkm_interop_internal_testing.exe
    1.3M    UnitTests_vtkm_interop_testing.exe
    2.9M    UnitTests_vtkm_io_reader_testing.exe
    548K    UnitTests_vtkm_io_writer_testing.exe
    792K    UnitTests_vtkm_rendering_testing.exe
    3.7M    UnitTests_vtkm_testing.exe
    320K    UnitTests_vtkm_worklet_internal_testing.exe
    68M     UnitTests_vtkm_worklet_testing.exe
    11M     vtkm_cont-1.3.dll
    2.1M    vtkm_interop-1.3.dll
    21M     vtkm_rendering-1.3.dll
    3.9M    vtkm_worklet-1.3.dll

    So far the increase is quite negligible.

    Worklets now support an execution mask

    There have recently been use cases where it would be helpful to mask out some of the invocations of a worklet. The idea is that when invoking a worklet with a mask array on the input domain, you might implement your worklet more-or-less like the following.

    VTKM_EXEC void operator()(bool mask, /* other parameters */)
    {
      if (mask)
      {
        // Do interesting stuff
      }
    }

    This works, but what if your mask has mostly false values? In that case, you are spending tons of time loading data to and from memory where fields are stored for no reason.

    You could potentially get around this problem by adding a scatter to the worklet. However, that will compress the output arrays to only values that are active in the mask. That is problematic if you want the masked output in the appropriate place in the original arrays. You will have to do some complex (and annoying and possibly expensive) permutations of the output arrays.

    Thus, we would like a new feature similar to scatter that instead masks out invocations so that the worklet is simply not run on those outputs.

    New Interface

    The new "Mask" feature that is similar (and orthogonal) to the existing "Scatter" feature. Worklet objects now define a MaskType that provides on object that manages the selections of which invocations are skipped. The following Mask objects are defined.

    • MaskNone - This removes any mask of the output. All outputs are generated. This is the default if no MaskType is explicitly defined.
    • MaskSelect - The user to provides an array that specifies whether each output is created with a 1 to mean that the output should be created an 0 the mean that it should not.
    • MaskIndices - The user provides an array with a list of indices for all outputs that should be created.

    It will be straightforward to implement other versions of masks. (For example, you could make a mask class that selectes every Nth entry.) Those could be made on an as-needed basis.

    Implementation

    The implementation follows the same basic idea of how scatters are implemented.

    Mask Classes

    The mask class is required to implement the following items.

    • ThreadToOutputType - A type for an array that maps a thread index (an index in the array) to an output index. A reasonable type for this could be vtkm::cont::ArrayHandle<vtkm::Id>.
    • GetThreadToOutputMap - Given the range for the output (e.g. the number of items in the output domain), returns an array of type ThreadToOutputType that is the actual map.
    • GetThreadRange - Given a range for the output (e.g. the number of items in the output domain), returns the range for the threads (e.g. the number of times the worklet will be invoked).

    Dispatching

    The vtkm::worklet::internal::DispatcherBase manages a mask class in the same way it manages the scatter class. It gets the MaskType from the worklet it is templated on. It requires a MaskType object during its construction.

    Previously the dispatcher (and downstream) had to manage the range and indices of inputs and threads. They now have to also manage a separate output range/index as now all three may be different.

    The vtkm::Invocation is changed to hold the ThreadToOutputMap array from the mask. It likewises has a templated ChangeThreadToOutputMap method added (similar to those already existing for the arrays from a scatter). This method is used in DispatcherBase::InvokeTransportParameters to add the mask's array to the invocation before calling InvokeSchedule.

    Thread Indices

    With the addition of masks, the ThreadIndices classes are changed to manage the actual output index. Previously, the output index was always the same as the thread index. However, now these two can be different. The GetThreadIndices methods of the worklet base classes have an argument added that is the portal to the ThreadToOutputMap.

    The worklet GetThreadIndices is called from the Task classes. These classes are changed to pass in this additional argument. Since the Task classes get an Invocation object from the dispatcher, which contains the ThreadToOutputMap, this change is trivial.

    Interaction Between Mask and Scatter

    Although it seems weird, it should work fine to mix scatters and masks. The scatter will first be applied to the input to generate a (potential) list of output elements. The mask will then be applied to these output elements.

    Redesign VTK-m Runtime Device Tracking

    The device tracking infrastructure in VTK-m has been redesigned to remove multiple redundant codes paths and to simplify reasoning about around what an instance of RuntimeDeviceTracker will modify.

    vtkm::cont::RuntimeDeviceTracker tracks runtime information on a per-user thread basis. This is done to allow multiple calling threads to use different vtk-m backends such as seen in this example:

      vtkm::cont::DeviceAdapterTagCuda cuda;
      vtkm::cont::DeviceAdapterTagOpenMP openmp;
      { // thread 1
        auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
        tracker->ForceDevice(cuda);
        vtkm::worklet::Invoker invoke;
        invoke(LightTask{}, input, output);
        vtkm::cont::Algorithm::Sort(output);
        invoke(HeavyTask{}, output);
      }
    
     { // thread 2
        auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
        tracker->ForceDevice(openmp);
        vtkm::worklet::Invoker invoke;
        invoke(LightTask{}, input, output);
        vtkm::cont::Algorithm::Sort(output);
        invoke(HeavyTask{}, output);
      }

    Note: GetGlobalRuntimeDeviceTracker has ben refactored to be GetRuntimeDeviceTracker as it always returned a unique instance for each control side thread. This design allows for different threads to have different runtime device settings. By removing the term Global from the name it becomes more clear what scope this class has.

    While this address the ability for threads to specify what device they should run on. It doesn't make it easy to toggle the status of a device in a programmatic way, for example the following block forces execution to only occur on cuda and doesn't restore previous active devices after

      {
      vtkm::cont::DeviceAdapterTagCuda cuda;
      auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
      tracker->ForceDevice(cuda);
      vtkm::worklet::Invoker invoke;
      invoke(LightTask{}, input, output);
      }
      //openmp/tbb/... still inactive

    To resolve those issues we have vtkm::cont::ScopedRuntimeDeviceTracker which has the same interface as vtkm::cont::RuntimeDeviceTracker but additionally resets any per-user thread modifications when it goes out of scope. So by switching over the previous example to use ScopedRuntimeDeviceTracker we correctly restore the threads RuntimeDeviceTracker state when tracker goes out of scope.

      {
      vtkm::cont::DeviceAdapterTagCuda cuda;
      vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda);
      vtkm::worklet::Invoker invoke;
      invoke(LightTask{}, input, output);
      }
      //openmp/tbb/... are now again active

    The vtkm::cont::ScopedRuntimeDeviceTracker is not limited to forcing execution to occur on a single device. When constructed it can either force execution to a device, disable a device or enable a device. These options also work with the DeviceAdapterTagAny.

      {
      //enable all devices
      vtkm::cont::DeviceAdapterTagAny any;
      vtkm::cont::ScopedRuntimeDeviceTracker tracker(any,
                                                     vtkm::cont::RuntimeDeviceTrackerMode::Enable);
      ...
      }
    
      {
      //disable only cuda
      vtkm::cont::DeviceAdapterTagCuda cuda;
      vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda,
                                                     vtkm::cont::RuntimeDeviceTrackerMode::Disable);
    
      ...
      }

    vtkm::cont::Initialize added to make setting up VTK-m runtime state easier

    A new initialization function, vtkm::cont::Initialize, has been added. Initialization is not required, but will configure the logging utilities (when enabled) and allows forcing a device via a -d or --device command line option.

    Usage:

    #include <vtkm/cont/Initialize.h>
    
    int main(int argc, char *argv[])
    {
      auto config = vtkm::cont::Initialize(argc, argv);
    
      ...
    }

    ArrayHandle

    Add vtkm::cont::ArrayHandleVirtual

    Added a new class named vtkm::cont::ArrayHandleVirtual that allows you to type erase an ArrayHandle storage type by using virtual calls. This simplification makes storing Fields and Coordinates significantly easier as VTK-m doesn't need to deduce both the storage and value type when executing worklets.

    To construct an vtkm::cont::ArrayHandleVirtual one can do the following:

    vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
    vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
    
    
    // constrcut from an array handle
    vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v(pressure);
    
    // or assign from an array handle
    v = constant;
    

    To help maintain performance vtkm::cont::ArrayHandleVirtual provides a collection of helper functions/methods to query and cast back to the concrete storage and value type:

    vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
    vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = constant;
    
    const bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
    if(isConstant)
      vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
    

    Lastly, a common operation of calling code using ArrayHandleVirtual is a desire to construct a new instance of an existing virtual handle with the same storage type. This can be done by using the NewInstance method as seen below

    vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
    vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = pressure;
    
    vtkm::cont::ArrayHandleVirtual<vtkm::Float32> newArray = v->NewInstance();
    bool isConstant = vtkm::cont::IsType< vtkm::cont::ArrayHandle<vtkm::Float32> >(newArray); //will be true

    vtkm::cont::ArrayHandleZip provides a consistent API even with non-writable handles

    Previously vtkm::cont::ArrayHandleZip could not wrap an implicit handle and provide a consistent experience. The primary issue was that if you tried to use the PortalType returned by GetPortalControl() you would get a compile failure. This would occur as the PortalType returned would try to call Set on an ImplicitPortal which doesn't have a set method.

    Now with this change, the ZipPortal use SFINAE to determine if Set and Get should call the underlying zipped portals.

    vtkm::cont::VariantArrayHandle replaces vtkm::cont::DynamicArrayHandle

    vtkm::cont::ArrayHandleVariant replaces vtkm::cont::DynamicArrayHandle as the primary method for holding onto a type erased vtkm::cont::ArrayHandle. The major difference between the two implementations is how they handle the Storage component of an array handle.

    vtkm::contDynamicArrayHandle approach was to find the fully deduced type of the ArrayHandle meaning it would check all value and storage types it knew about until it found a match. This cross product of values and storages would cause significant compilation times when a DynamicArrayHandle had multiple storage types.

    vtkm::cont::VariantArrayHandle approach is to only deduce the value type of the ArrayHandle and return a vtkm::cont::ArrayHandleVirtual which uses polymorpishm to hide the actual storage type. This approach allows for better compile times, and for calling code to always expect an ArrayHandleVirtual instead of the fully deduced type. This conversion to ArrayHandleVirtual is usually done internally within VTK-m when a worklet or filter is invoked.

    In certain cases users of VariantArrayHandle want to be able to access the concrete ArrayHandle<T,S> and not have it wrapped in a ArrayHandleVirtual. For those occurrences VariantArrayHandle provides a collection of helper functions/methods to query and cast back to the concrete storage and value type:

    vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
    vtkm::cont::ArrayHandleVariant v(constant);
    
    const bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
    if(isConstant)
      vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
    

    Lastly, a common operation of calling code using VariantArrayHandle is a desire to construct a new instance of an existing virtual handle with the same storage type. This can be done by using the NewInstance method as seen below:

    vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
    vtkm::cont::ArrayHandleVariant v(pressure);
    
    vtkm::cont::ArrayHandleVariant newArray = v->NewInstance();
    const bool isConstant = vtkm::cont::IsType< decltype(pressure) >(newArray); //will be true

    vtkm::cont::VariantArrayHandle CastAndCall supports casting to concrete types

    Previously, the VariantArrayHandle::CastAndCall (and indirect calls through vtkm::cont::CastAndCall) attempted to cast to only vtkm::cont::ArrayHandleVirtual with different value types. That worked, but it meant that whatever was called had to operate through virtual functions.

    Under most circumstances, it is worthwhile to also check for some common storage types that, when encountered, can be accessed much faster. This change provides the casting to concrete storage types and now uses vtkm::cont::ArrayHandleVirtual as a fallback when no concrete storage type is found.

    By default, CastAndCall checks all the storage types in VTKM_DEFAULT_STORAGE_LIST_TAG, which typically contains only the basic storage. The ArrayHandleVirtual::CastAndCall method also allows you to override this behavior by specifying a different type list in the first argument. If the first argument is a list type, CastAndCall assumes that all the types in the list are storage tags. If you pass in vtkm::ListTagEmpty, then CastAndCall will always cast to an ArrayHandleVirtual (the previous behavior). Alternately, you can pass in storage tags that might be likely under the current usage.

    As an example, consider the following simple code.

    vtkm::cont::VariantArrayHandle array;
    
    // stuff happens
    
    array.CastAndCall(myFunctor);

    Previously, myFunctor would be called with vtkm::cont::ArrayHandleVirtual<T> with different type Ts. After this change, myFunctor will be called with that and with vtkm::cont::ArrayHandle<T> of the same type Ts.

    If you want to only call myFunctor with vtkm::cont::ArrayHandleVirtual<T>, then replace the previous line with

    array.CastAndCall(vtkm::ListTagEmpty(), myFunctor);

    Let's say that additionally using vtkm::cont::ArrayHandleIndex was also common. If you want to also specialize for that array, you can do so with the following line.

    array.CastAndCall(vtkm::ListTagBase<vtkm::cont::StorageBasic,
                                        vtkm::cont::ArrayHandleIndex::StorageTag>,
                      myFunctor);

    Note that myFunctor will be called with vtkm::cont::ArrayHandle<T,vtkm::cont::ArrayHandleIndex::StorageTag>, not vtkm::cont::ArrayHandleIndex.

    vtkm::cont::VariantArrayHandle::AsVirtual<T>() performs casting

    The AsVirtual<T> method of vtkm::cont::VariantArrayHandle now works for any arithmetic type, not just the actual type of the underlying array. This works by inserting an ArrayHandleCast between the underlying concrete array and the new ArrayHandleVirtual when needed.

    StorageBasic::StealArray() now provides delete function to new owner

    Memory that is stolen from VTK-m has to be freed correctly. This is required as the memory could have been allocated with new, malloc or even cudaMallocManaged.

    Previously it was very easy to transfer ownership of memory out of VTK-m and either fail to capture the free function, or ask for it after the transfer operation which would return a nullptr. Now stealing an array also provides the free function reducing one source of memory leaks.

    To properly steal memory from VTK-m you do the following:

      vtkm::cont::ArrayHandle<T> arrayHandle;
    
      ...
    
      auto* stolen = arrayHandle.StealArray();
      T* ptr = stolen.first;
      auto free_function = stolen.second;
    
      ...
    
      free_function(ptr);

    Control Environment

    vtkm::cont::CellLocatorGeneral has been added

    vtkm::cont::CellLocatorUniformBins can work with all kinds of datasets, but there are cell locators that are more efficient for specific data sets. Therefore, a new cell locator - vtkm::cont::CellLocatorGeneral has been implemented that can be configured to use specialized cell locators based on its input data. A "configurator" function object can be specified using the SetConfigurator() function. The configurator should have the following signature:

    void (std::unique_ptr<vtkm::cont::CellLocator>&,
         const vtkm::cont::DynamicCellSet&,
         const vtkm::cont::CoordinateSystem&);

    The configurator is invoked whenever the Update method is called and the input has changed. The current cell locator is passed in a std::unique_ptr. Based on the types of the input cellset and coordinates, and possibly some heuristics on their values, the current cell locator's parameters can be updated, or a different cell-locator can be instantiated and transferred to the unique_ptr. The default configurator configures a vtkm::cont::CellLocatorUniformGrid for uniform grid datasets, a vtkm::cont::CellLocatorRecitlinearGrid for rectilinear datasets, and vtkm::cont::CellLocatorUniformBins for all other dataset types.

    The class CellLocatorHelper that implemented similar functionality to CellLocatorGeneral has been removed.

    vtkm::cont::CellLocatorTwoLevelUniformGrid has been renamed to vtkm::cont::CellLocatorUniformBins

    CellLocatorTwoLevelUniformGrid has been renamed to CellLocatorUniformBins for brevity. It has been modified to be a subclass of vtkm::cont::CellLocator and can be used wherever a CellLocator is accepted.

    vtkm::cont::Timer now supports asynchronous and device independent timers

    vtkm::cont::Timer can now track execution time on a single device or across all enabled devices as seen below:

    vtkm::cont::Timer tbb_timer{vtkm::cont::DeviceAdaptertagTBB()};
    vtkm::cont::Timer all_timer;
    
    all_timer.Start();
    tbb_timer.Start();
    // Run blocking algorithm on tbb
    tbb_timer.Stop();
    // Run async-algorithms cuda
    all_timer.Stop();
    
    // Do more work
    
    //Now get time for all tbb work, and tbb_cuda work
    auto tbb_time = tbb_timer.GetElapsedTime();
    auto all_time = tbb_timer.GetElapsedTime();

    When Timer is constructed without an explicit vtkm::cont::DeviceAdapterId it will track all device adapters and return the maximum elapsed time over all devices when GetElapsedTime is called.

    vtkm::cont::DeviceAdapterId construction from strings are now case-insensitive

    You can now construct a vtkm::cont::DeviceAdapterId from a string no matter the case of it. The following all will construct the same vtkm::cont::DeviceAdapterId.

    vtkm::cont::DeviceAdapterId id1 = vtkm::cont::make_DeviceAdapterId("cuda");
    vtkm::cont::DeviceAdapterId id2 = vtkm::cont::make_DeviceAdapterId("CUDA");
    vtkm::cont::DeviceAdapterId id3 = vtkm::cont::make_DeviceAdapterId("Cuda");
    
    auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
    vtkm::cont::DeviceAdapterId id4 = tracker.GetDeviceAdapterId("cuda");
    vtkm::cont::DeviceAdapterId id5 = tracker.GetDeviceAdapterId("CUDA");
    vtkm::cont::DeviceAdapterId id6 = tracker.GetDeviceAdapterId("Cuda");

    vtkm::cont::Initialize will only parse known arguments

    When a library requires reading some command line arguments through a function like Initialize, it is typical that it will parse through arguments it supports and then remove those arguments from argc and argv so that the remaining arguments can be parsed by the calling program. Recent changes to the vtkm::cont::Initialize function support that.

    Use Case

    Say you are creating a simple benchmark where you want to provide a command line option --size that allows you to adjust the size of the data that you are working on. However, you also want to support flags like --device and -v that are performed by vtkm::cont::Initialize. Rather than have to re-implement all of Initialize's parsing, you can now first call Initialize to handle its arguments and then parse the remaining objects.

    The following is a simple (and rather incomplete) example:

    int main(int argc, char** argv)
    {
      vtkm::cont::InitializeResult initResult = vtkm::cont::Initialize(argc, argv);
    
      if ((argc > 1) && (strcmp(argv[1], "--size") == 0))
      {
        if (argc < 3)
    	{
    	  std::cerr << "--size option requires a numeric argument" << std::endl;
    	  std::cerr << "USAGE: " << argv[0] << " [options]" << std::endl;
    	  std::cerr << "Options are:" << std::endl;
    	  std::cerr << "  --size <number>\tSpecify the size of the data." << std::endl;
    	  std::cerr << initResult.Usage << std::endl;
    	  exit(1);
    	}
    
    	g_size = atoi(argv[2]);
      }
    
      std::cout << "Using device: " << initResult.Device.GetName() << std::endl;

    Additional Initialize Options

    Because vtkm::cont::Initialize no longer has the assumption that it is responsible for parsing all arguments, some options have been added to vtkm::cont::InitializeOptions to manage these different use cases. The following options are now supported.

    • None A placeholder for having all options off, which is the default. (Same as before this change.)
    • RequireDevice Issue an error if the device argument is not specified. (Same as before this change.)
    • DefaultAnyDevice If no device is specified, treat it as if the user gave --device=Any. This means that DeviceAdapterTagUndefined will never be return in the result.
    • AddHelp Add a help argument. If -h or --help is provided, prints a usage statement. Of course, the usage statement will only print out arguments processed by VTK-m.
    • ErrorOnBadOption If an unknown option is encountered, the program terminates with an error and a usage statement is printed. If this option is not provided, any unknown options are returned in argv. If this option is used, it is a good idea to use AddHelp as well.
    • ErrorOnBadArgument If an extra argument is encountered, the program terminates with an error and a usage statement is printed. If this option is not provided, any unknown arguments are returned in argv.
    • Strict If supplied, Initialize treats its own arguments as the only ones supported by the application and provides an error if not followed exactly. This is a convenience option that is a combination of ErrorOnBadOption, ErrorOnBadArgument, and AddHelp.

    InitializeResult Changes

    The changes in Initialize have also necessitated the changing of some of the fields in the InitializeResult structure. The following fields are now provided in the InitializeResult struct.

    • Device Returns the device selected in the command line arguments as a DeviceAdapterId. If no device was selected, DeviceAdapterTagUndefined is returned. (Same as before this change.)
    • Usage Returns a string containing the usage for the options recognized by Initialize. This can be used to build larger usage statements containing options for both Initialize and the calling program. See the example above.

    Note that the Arguments field has been removed from InitializeResult. This is because the unparsed arguments are now returned in the modified argc and argv, which provides a more complete result than the Arguments field did.

    Execution Environment

    VTK-m logs details about each CUDA kernel launch

    The VTK-m logging infrastructure has been extended with a new log level KernelLaunches which exists between MemTransfer and Cast.

    This log level reports the number of blocks, threads per block, and the PTX version of each CUDA kernel launched.

    This logging level was primarily introduced to help developers that are tracking down issues that occur when VTK-m components have been built with different sm_XX flags and help people looking to do kernel performance tuning.

    VTK-m CUDA allocations can have managed memory (cudaMallocManaged) enabled/disabled from C++

    Previously it was impossible for calling code to explicitly disable cuda managed memory. This can be desirable for projects that know they don't need managed memory and are super performance critical.

    const bool usingManagedMemory = vtkm::cont::cuda::internal::CudaAllocator::UsingManagedMemory();
    if(usingManagedMemory)
      {  //disable managed memory
      vtkm::cont::cuda::internal::CudaAllocator::ForceManagedMemoryOff();
      }

    VTK-m CUDA kernel scheduling improved including better defaults, and user customization support

    VTK-m now offers a more GPU aware set of defaults for kernel scheduling. When VTK-m first launches a kernel we do system introspection and determine what GPU's are on the machine and than match this information to a preset table of values. The implementation is designed in a way that allows for VTK-m to offer both specific presets for a given GPU ( V100 ) or for an entire generation of cards ( Pascal ).

    Currently VTK-m offers preset tables for the following GPU's:

    • Tesla V100
    • Tesla P100

    If the hardware doesn't match a specific GPU card we than try to find the nearest know hardware generation and use those defaults. Currently we offer defaults for

    • Older than Pascal Hardware
    • Pascal Hardware
    • Volta+ Hardware

    Some users have workloads that don't align with the defaults provided by VTK-m. When that is the cause, it is possible to override the defaults by binding a custom function to vtkm::cont::cuda::InitScheduleParameters. As shown below:

      ScheduleParameters CustomScheduleValues(char const* name,
                                              int major,
                                              int minor,
                                              int multiProcessorCount,
                                              int maxThreadsPerMultiProcessor,
                                              int maxThreadsPerBlock)
      {
    
        ScheduleParameters params  {
            64 * multiProcessorCount,  //1d blocks
            64,                        //1d threads per block
            64 * multiProcessorCount,  //2d blocks
            { 8, 8, 1 },               //2d threads per block
            64 * multiProcessorCount,  //3d blocks
            { 4, 4, 4 } };             //3d threads per block
        return params;
      }
      vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);

    VTK-m Reduction algorithm now supports differing input and output types

    It is common to want to perform a reduction where the input and output types are of differing types. A basic example would be when the input is vtkm::UInt8 but the output is vtkm::UInt64. This has been supported since v1.2, as the input type can be implicitly convertible to the output type.

    What we now support is when the input type is not implicitly convertible to the output type, such as when the output type is vtkm::Pair< vtkm::UInt64, vtkm::UInt64>. For this to work we require that the custom binary operator implements also an operator() which handles the unary transformation of input to output.

    An example of a custom reduction operator for differing input and output types is:

    
      struct CustomMinAndMax
      {
        using OutputType = vtkm::Pair<vtkm::Float64, vtkm::Float64>;
    
        VTKM_EXEC_CONT
        OutputType operator()(vtkm::Float64 a) const
        {
        return OutputType(a, a);
        }
    
        VTKM_EXEC_CONT
        OutputType operator()(vtkm::Float64 a, vtkm::Float64 b) const
        {
          return OutputType(vtkm::Min(a, b), vtkm::Max(a, b));
        }
    
        VTKM_EXEC_CONT
        OutputType operator()(const OutputType& a, const OutputType& b) const
        {
          return OutputType(vtkm::Min(a.first, b.first), vtkm::Max(a.second, b.second));
        }
    
        VTKM_EXEC_CONT
        OutputType operator()(vtkm::Float64 a, const OutputType& b) const
        {
          return OutputType(vtkm::Min(a, b.first), vtkm::Max(a, b.second));
        }
    
        VTKM_EXEC_CONT
        OutputType operator()(const OutputType& a, vtkm::Float64 b) const
        {
          return OutputType(vtkm::Min(a.first, b), vtkm::Max(a.second, b));
        }
      };
    
    

    Added specialized operators for ArrayPortalValueReference

    The ArrayPortalValueReference is supposed to behave just like the value it encapsulates and does so by automatically converting to the base type when necessary. However, when it is possible to convert that to something else, it is possible to get errors about ambiguous overloads. To avoid these, add specialized versions of the operators to specify which ones should be used.

    Also consolidated the CUDA version of an ArrayPortalValueReference to the standard one. The two implementations were equivalent and we would like changes to apply to both.

    Worklets and Filters

    vtkm::worklet::Invoker now supports worklets which require a Scatter object

    This change allows the Invoker class to support launching worklets that require a custom scatter operation. This is done by providing the scatter as the second argument when launch a worklet with the () operator.

    The following example shows a scatter being provided with a worklet launch.

    struct CheckTopology : vtkm::worklet::WorkletMapPointToCell
    {
      using ControlSignature = void(CellSetIn cellset, FieldOutCell);
      using ExecutionSignature = _2(FromIndices);
      using ScatterType = vtkm::worklet::ScatterPermutation<>;
      ...
    };
    
    
    vtkm::worklet::Ivoker invoke;
    invoke( CheckTopology{}, vtkm::worklet::ScatterPermutation{}, cellset, result );

    BitFields are now a support field input/out type for VTK-m worklets

    BitFields are:

    • Stored in memory using a contiguous buffer of bits.
    • Accessible via portals, a la ArrayHandle.
    • Portals operate on individual bits or words.
    • Operations may be atomic for safe use from concurrent kernels.

    The new BitFieldToUnorderedSet device algorithm produces an ArrayHandle containing the indices of all set bits, in no particular order.

    The new AtomicInterface classes provide an abstraction into bitwise atomic operations across control and execution environments and are used to implement the BitPortals.

    BitFields may be used as boolean-typed ArrayHandles using the ArrayHandleBitField adapter. vtkm::cont::ArrayHandleBitField uses atomic operations to read and write bits in the BitField, and is safe to use in concurrent code.

    For example, a simple worklet that merges two arrays based on a boolean condition:

    class ConditionalMergeWorklet : public vtkm::worklet::WorkletMapField
    {
    public:
    using ControlSignature = void(FieldIn cond,
                                  FieldIn trueVals,
                                  FieldIn falseVals,
                                  FieldOut result);
    using ExecutionSignature = _4(_1, _2, _3);
    
    template <typename T>
    VTKM_EXEC T operator()(bool cond, const T& trueVal, const T& falseVal) const
    {
      return cond ? trueVal : falseVal;
    }
    
    };
    
    BitField bits = ...;
    auto condArray = vtkm::cont::make_ArrayHandleBitField(bits);
    auto trueArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(20, 2, NUM_BITS);
    auto falseArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(13, 2, NUM_BITS);
    vtkm::cont::ArrayHandle<vtkm::Id> output;
    
    vtkm::worklet::Invoker invoke( vtkm::cont::DeviceAdaptertagTBB{} );
    invoke(ConditionalMergeWorklet{}, condArray, trueArray, falseArray, output);
    

    Added a Point Merging worklet

    We have added vtkm::worklet::PointMerge which uses a virtual grid approach to identify nearby points. The worklet works by creating a very fine but sparsely represented locator grid. It then groups points by grid bins and finds those within a specified radius.

    vtkm::filter::CleanGrid now can do point merging

    The CleanGrid filter has been extended to use vtkm::worklet::PointMerge to allow for point merging. The following flags have been added to CleanGrid to modify the behavior of point merging.

    • Set/GetMergePoints - a flag to turn on/off the merging of duplicated coincident points. This extra operation will find points spatially located near each other and merge them together.
    • Set/GetTolerance - Defines the tolerance used when determining whether two points are considered coincident. If the ToleranceIsAbsolute flag is false (the default), then this tolerance is scaled by the diagonal of the points. This parameter is only used when merge points is on.
    • Set/GetToleranceIsAbsolute - When ToleranceIsAbsolute is false (the default) then the tolerance is scaled by the diagonal of the bounds of the dataset. If true, then the tolerance is taken as the actual distance to use. This parameter is only used when merge points is on.
    • Set/GetFastMerge - When FastMerge is true (the default), some corners are cut when computing coincident points. The point merge will go faster but the tolerance will not be strictly followed.

    Added a connected component worklets and filters

    We have added the vtkm::filter::ImageConnectivity and vtkm::filter::CellSetConnectivity filters to identify connected components in DataSets and the corresponding worklets. The ImageConnectivity identify connected components in vtkm::cont::CellSetStructured, based on same field value of neighboring cells. The CellSetConnectivity identify connected components based on cell connectivity.

    Currently Moore neighborhood (i.e. 8 neighboring pixels for 2D and 27 neighboring pixels for 3D) is used for ImageConnectivity. For CellSetConnectivity, neighborhood is defined as cells sharing a common edge.

    Build

    CMake 3.8+ now required to build VTK-m

    Historically VTK-m has offered the ability to build a small subset of device adapters with CMake 3.3. As both our primary consumers have moved to CMake 3.8, and HPC machines continue to provide new CMake versions we have decided to simplify our CMake build system by requiring CMake 3.8 everywhere.

    VTK-m now can verify that it installs itself correctly

    It was a fairly common occurrence of VTK-m to have a broken install tree as it had no easy way to verify that all headers would be installed.

    Now VTK-m offers a testing infrastructure that creates a temporary installed version and compile tests that build against the installed VTK-m version. Currently we have tests that verify each header listed in VTK-m is installed, users can compile a custom vtkm::filter that uses diy, and users can call vtkm::rendering.

    VTK-m now requires CUDA separable compilation to build

    With the introduction of vtkm::cont::ArrayHandleVirtual and the related infrastructure, vtk-m now requires that all CUDA code be compiled using separable compilation ( -rdc ).

    VTK-m provides a vtkm_filter CMake target

    VTK-m now provides a vtkm_filter target that contains pre-built components of filters for consuming projects.

    vtkm::cont::CellLocatorBoundingIntervalHierarchy is compiled into vtkm_cont

    All of the methods in CellLocatorBoundingIntervalHierarchy were listed in header files. This is sometimes problematic with virtual methods. Since everything implemented in it can just be embedded in a library, move the code into the vtkm_cont library.

    These changes caused some warnings in clang to show up based on virtual methods in other cell locators. Hence, the rest of the cell locators have also had some of their code moved to vtkm_cont.

    Other

    LodePNG added as a thirdparty package

    The lodepng library was brought is an thirdparty library. This has allowed the VTK-m rendering library to have a robust png decode functionality.

    Optionparser added as a thirdparty package

    Previously we just took the optionparser.h file and stuck it right in our source code. That was problematic for a variety of reasons.

    • It incorrectly assigned our license to external code.
    • It made lots of unnecessary changes to the original source (like reformatting).
    • It made it near impossible to track patches we make and updates to the original software.

    Now we use the third-party system to track changes to optionparser.h in the https://gitlab.kitware.com/third-party/optionparser repository.

    Thirdparty diy now can coexist with external diy

    Previously VTK-m would leak macros that would cause an external diy to be incorrectly mangled breaking consumers of VTK-m that used diy.

    Going forward to use diy from VTK-m all calls must use the vtkmdiy namespace instead of the diy namespace. This allows for VTK-m to properly forward calls to either the external or internal version correctly.

    Merge benchmark executables into a device dependent shared library

    VTK-m has been updated to replace old per device benchmark executables with a single multi-device executable. Selection of the device adapter is done at runtime through the --device= argument.

    Merge rendering testing executables to a shared library

    VTK-m has been updated to replace old per device rendering testing executables with a single multi-device executable. Selection of the device adapter is done at runtime through the --device= argument.

    Merge worklet testing executables into a device dependent shared library

    VTK-m has been updated to replace old per device working testing executables with a single multi-device executable. Selection of the device adapter is done at runtime through the --device= argument.

    VTK-m runtime device detection properly handles busy CUDA devices

    When an application that uses VTK-m is first launched it will do a check to see if CUDA is supported at runtime. If for some reason that CUDA card is not allowing kernel execution VTK-m would report the hardware doesn't have CUDA support.

    This was problematic as was over aggressive in disabling CUDA support for hardware that could support kernel execution in the future. With the fact that every VTK-m worklet is executed through a TryExecute it is no longer necessary to be so aggressive in disabling CUDA support.

    Now the behavior is that VTK-m considers a machine to have CUDA runtime support if it has 1+ GPU's of Kepler or higher hardware (SM_30+).

  • v1.3.0   Release VTK-m 1.3.0 1.3.0 is our fourth official release of VTK-m. The major changes to VTK-m from 1.2.0 can be found in: docs/changelog/1.3/release-notes.md
    a8da749e · Release VTK-m 1.3.0 ·

    VTK-m 1.3 Release Notes

    Table of Contents

    1. Core
      • OpenMP Device Adapter
      • Make all worklets dispatches invoke using a TryExecute
      • Remove TryExecute from each filter
      • DeviceAdapterTags are usable for runtime device selection
      • New geometry classes
      • Logging support via loguru.
    2. ArrayHandle
      • ArrayHandle can now take ownership of a user allocated memory location
      • StorageBasic now can provide or be provided a delete function
      • ArrayHandleTransform works with ExecObject
      • Add ArrayHandleView fancy array
      • ArrayHandleCompositeVector simplified and made writable.
      • ArrayHandleExtractComponent target component is now set at runtime
      • ArrayHandleSwizzle component maps are now set at runtime
    3. Control Environment
      • Interfaces for VTK-m spatial search structures added
      • vtkm::cont::Algorithm now can be told which device to use at runtime
      • Support ExecArg behavior in vtkm::cont::Algorithm methods
      • vtkm::cont::TryExecuteOnDevice allows for runtime selection of which device to execute on
      • vtkm::cont::CellSetExplicit now caches CellToPoint table when used with Invoke
      • vtkm::cont::RuntimeDeviceInformation doesn't require a device template
      • vtkm::cont::VirtualObjectHandle can transfer to a device using runtime DeviceAdapterId value
      • Add new vtkm::exec and vtkm::cont execution objects
      • Added a ReleaseResources API to CellSet and its derived classes
      • Added a ReleaseResourcesExecution API for Field to unload execution resources
      • Added a ReleaseResourcesExecution API for CoordinateSystem to unload execution resources
      • Use the strong typed enums for vtkm::cont::Field
      • vtkm::cont::DeviceAdapterId has becomes a real constexpr type and not an alias to vtkm::UInt8
    4. Execution Environment
      • User defined execution objects now usable with runtime selection of device adapter
      • Dot function name changed
      • Added float version operations for vtkm::Math Pi()
      • vtkm::Math Pi functions are now constexpr
      • CellDerivativeFor3DCell has a better version for Vec of Vec fields.
      • Add a CUDA-safe vtkm::Swap method
    5. Worklets and Filters
      • Worklets are now asynchronous on CUDA
      • Worklets now execute on CUDA using grid stride loops
      • Scatter class moved to dispatcher
      • VTK-m now supports dispatcher parameters being pointers
      • Histogram filter supports custom types
      • Cell measure functions, worklet, and filter
      • Add a WaveletGenerator worklet (e.g. vtkRTAnalyticSource)
      • Add a filter to support Lagrangian analysis capabilities
      • Add a warp vector worklet and filter
      • Add a warp scalar worklet and filter
      • Add a split sharp edge filter
      • Time-varying "oscillator" filter and example
    6. Build
      • Build System Redesign and new minimum CMake
      • Add none option to VTKm_CUDA_Architecture
      • Add a new cmake option: VTKm_ENABLE_DEVELOPER_FLAGS
      • Add a new cmake option: VTKm_INSTALL_ONLY_LIBRARIES
    7. Other
      • Control CUDA managed memory with an environment variable
      • Add support for deferred freeing of CUDA memory
      • Allow variable arguments to VTKM_TEST_ASSERT
      • Support constexpr and variadic constructor for Vec
      • vtkm::Vec< vtkm::Vec<T> > can't be constructed from vtkm::Vec<U>
      • Use std::call_once to construct singeltons
      • Use thread_local in vtkm::cont::GetGlobalRuntimeDeviceTracker function if possible
      • Replace std::random_shuffle with std::shuffle

    Core

    OpenMP Device Adapter

    A device adapter that leverages OpenMP 4.0 for threading is now available. The new adapter is enabled using the CMake option VTKm_ENABLE_OPENMP and its performance is comparable to the TBB device adapter.

    Performance comparisions of OpenMP against the TBB and Serial device adapters can be found at: #223

    Make all worklets dispatches invoke using a TryExecute

    Rather than force all dispatchers to be templated on a device adapter, instead use a TryExecute internally within the invoke to select a device adapter.

    This changes touches quite a bit a code. The first pass of the change usually does the minimum amount of work, which is to change the compile-time specification of the device to a run-time call to SetDevice on the dispatcher. Although functionally equivalent, it might mean calling TryExecute within itself.

    Remove TryExecute from each filter

    The recenet change to dispatchers has embedded a TryExecute internally within the Invoke function of all dispatchers. This means that it is no longer necessary to specify a device when invoking a worklet.

    Previously, this TryExecute was in the filter layer. The filter superclasses would do a TryExecute and use that to pass to subclasses in methods like DoExecute and DoMapField. Since the dispatcher no longer needs a device this TryExecute is generally unnecessary and always redundant. Thus, it has been removed.

    Because of this, the device argument to DoExecute and DoMapField has been removed. This will cause current implementations of filter to change, but it usually simplifies code. That said, there might be some code that needs to be wrapped into a vtkm::cont::ExecObjectBase.

    No changes need to be made to code that uses filters.

    DeviceAdapterTags are usable for runtime device selection

    VTK-m DeviceAdapterTags now are both a compile time representation of which device to use, and also the runtime representation of that device. Previously the runtime representation was handled by vtkm::cont::DeviceAdapterId. This was done by making DeviceAdapterTag's' a constexpr type that inherits from the constexpr vtkm::cont::DeviceAdapterId type.

    At at ten thousand foot level this change means that in general instead of using vtkm::cont::DeviceAdapterTraits<DeviceTag> you can simply use DeviceTag, or an instance of if DeviceTag runtimeDeviceId;.

    Previously if you wanted to get the runtime representation of a device you would do the following:

    template<typename DeviceTag>
    vtkm::cont::DeviceAdapterId getDeviceId()
    {
      using Traits = vtkm::cont::DeviceAdapterTraits<DeviceTag>;
      return Traits::GetId();
    }
    ...
    vtkm::cont::DeviceAdapterId runtimeId = getDeviceId<DeviceTag>();

    Now with the updates you could do the following.

    vtkm::cont::DeviceAdapterId runtimeId = DeviceTag();

    More importantly this conversion is unnecessary as you can pass instances DeviceAdapterTags into methods or functions that want vtkm::cont::DeviceAdapterId as they are that type!

    Previously if you wanted to see if a DeviceAdapter was enabled you would the following:

    using Traits = vtkm::cont::DeviceAdapterTraits<DeviceTag>;
    constexpr auto isValid = std::integral_constant<bool, Traits::Valid>();

    Now you would do:

    constexpr auto isValid = std::integral_constant<bool, DeviceTag::IsEnabled>();

    So why did VTK-m make these changes?

    That is a good question, and the answer for that is two fold. The VTK-m project is working better support for ArraysHandles that leverage runtime polymorphism (aka virtuals), and the ability to construct vtkm::worklet::Dispatchers without specifying the explicit device they should run on. Both of these designs push more of the VTK-m logic to operate at runtime rather than compile time. This changes are designed to allow for consistent object usage between runtime and compile time instead of having to convert between compile time and runtime types.

    New geometry classes

    There are now some additional structures available both the control and execution environments for representing geometric entities (mostly of dimensions 2 and 3). These new structures are now in vtkm/Geometry.h and demonstrated/tested in vtkm/testing/TestingGeometry.h:

    • Ray<CoordType, Dimension, IsTwoSided>. Instances of this struct represent a semi-infinite line segment in a 2-D plane or in a 3-D space, depending on the integer dimension specified as a template parameter. Its state is the point at the start of the ray (Origin) plus the ray's Direction, a unit-length vector. If the third template parameter (IsTwoSided) is true, then the ray serves as an infinite line. Otherwise, the ray will only report intersections in its positive halfspace.
    • LineSegment<CoordType, Dimension>. Instances of this struct represent a finite line segment in a 2-D plane or in a 3-D space, depending on the integer dimension specified as a template parameter. Its state is the coordinates of its Endpoints.
    • Plane<CoordType>. Instances of this struct represent a plane in 3-D. Its state is the coordinates of a base point (Origin) and a unit-length normal vector (Normal).
    • Sphere<CoordType, Dimension>. Instances of this struct represent a d-dimensional sphere. Its state is the coordinates of its center plus a radius. It is also aliased with a using statement to Circle<CoordType> for the specific case of 2-D.

    These structures provide useful queries and generally interact with one another. For instance, it is possible to intersect lines and planes and compute distances.

    For ease of use, there are also several using statements that alias these geometric structures to names that specialize them for a particular dimension or other template parameter. As an example, Ray<CoordType, Dimension, true> is aliased to Line<CoordType, Dimension> and Ray<CoordType, 3, true> is aliased to Line3<CoordType> and Ray<FloatDefault, 3, true> is aliased to Line3d.

    Design patterns

    If you plan to add a new geometric entity type, please adopt these conventions:

    • Each geometric entity may be default-constructed. The default constructor will initialize the state to some valid unit-length entity, usually with some part of its state at the origin of the coordinate system.
    • Entities may always be constructed by passing in values for their internal state. Alternate construction methods are declared as free functions such as make_CircleFrom3Points()
    • Use template metaprogramming to make methods available only when the template dimension gives them semantic meaning. For example, a 2-D line segment's perpendicular bisector is another line segment, but a 3-D line segment's perpendicular line segment is a plane. Note how this is accomplished and apply this pattern to new geometric entities or new methods on existing entities.
    • Some entities may have invalid state. If this is possible, the entity will have an IsValid() method. For example, a sphere may be invalid because the user or some construction technique specified a zero or negative radius.
    • When signed distance is semantically meaningful, provide it in favor of or in addition to unsigned distance.
    • Accept a tolerance parameter when appropriate, but provide a sensible default value. You may want to perform exact arithmetic versions of tests, but please provide fast, tolerance-based versions as well.

    Logging support via loguru.

    The loguru project has been integrated with VTK-m to provide runtime logging facilities. A sample of the log output can be found at https://gitlab.kitware.com/snippets/427.

    Logging is enabled by setting the CMake variable VTKm_ENABLE_LOGGING. When this flag is enabled, any messages logged to the Info, Warn, Error, and Fatal levels are printed to stderr by default.

    Additional logging features are enabled by calling vtkm::cont::InitLogging in an executable. This will:

    • Set human-readable names for the log levels in the output.
    • Allow the stderr logging level to be set at runtime by passing a '-v [level]' argument to the executable.
    • Name the main thread.
    • Print a preamble with details of the program's startup (args, etc).
    • Install signal handlers to automatically print stacktraces and error contexts (linux only) on crashes.

    The main logging entry points are the macros VTKM_LOG_S and VTKM_LOG_F, which using C++ stream and printf syntax, repectively. Other variants exist, including conditional logging and special-purpose logs for writing specific events, such as DynamicObject cast results and TryExecute failures.

    The logging backend supports the concept of "Scopes". By creating a new scope with the macros VTKM_LOG_SCOPE or VTKM_LOG_SCOPE_FUNCTION, a new "logging scope" is opened within the C++ scope the macro is called from. New messages will be indented in the log until the scope ends, at which point a message is logged with the elapsed time that the scope was active. Scopes may be nested to arbitrary depths.

    The logging implementation is thread-safe. When working in a multithreaded environment, each thread may be assigned a human-readable name using vtkm::cont::SetThreadName. This will appear in the log output so that per-thread messages can be easily tracked.

    By default, only Info, Warn, Error, and Fatal messages are printed to stderr. This can be changed at runtime by passing the '-v' flag to an executable that calls vtkm::cont::InitLogging. Alternatively, the application can explicitly call vtkm::cont::SetStderrLogLevel to change the verbosity. When specifying a verbosity, all log levels with enum values less-than-or-equal-to the requested level are printed. vtkm::cont::LogLevel::Off (or "-v Off") may be used to silence the log completely.

    The helper functions vtkm::cont::GetHumanReadableSize and vtkm::cont::GetSizeString assist in formating byte sizes to a more readable format. Similarly, the vtkm::cont::TypeName template functions provide RTTI based type-name information. When logging is enabled, these use the logging backend to demangle symbol names on supported platforms.

    The more verbose VTK-m log levels are:

    • Perf: Logs performance information, using the scopes feature to track execution time of filters, worklets, and device algorithms with microsecond resolution.
    • MemCont / MemExec: These levels log memory allocations in the control and execution environments, respectively.
    • MemTransfer: This level logs memory transfers between the control and host environments.
    • Cast: Logs details of dynamic object resolution.

    The log may be shared and extended by applications that use VTK-m. There are two log level ranges left available for applications: User and UserVerbose. The User levels may be enabled without showing any of the verbose VTK-m levels, while UserVerbose levels will also enable all VTK-m levels.

    ArrayHandle

    ArrayHandle can now take ownership of a user allocated memory location

    Previously memory that was allocated outside of VTK-m was impossible to transfer to VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors to support a Storage object that is being moved, we can clearly express that the ArrayHandle now owns memory it didn't allocate.

    Here is an example of how this is done:

      T* buffer = new T[100];
      auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
    
      vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
          storage(buffer, 100, user_free_function);
      vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));

    StorageBasic now can provide or be provided a delete function

    Memory that was allocated outside of VTK-m was impossible to transfer to VTK-m as we didn't know how to free it. This is now resolved by allowing the user to specify a free function to be called on release.

    Memory that was allocated by VTK-m and Stolen by the user needed the proper free function. When running on CUDA on hardware that supports concurrent managed access the free function of the storage could be cudaFree.

    To properly steal memory from VTK-m you do the following:

      vtkm::cont::ArrayHandle<T> arrayHandle;
      //fill arrayHandle
    
      //you must get the free function before calling steal array
      auto free_function = arrayHandle.GetDeleteFunction();
      T* ptr = arrayHandle.StealArray();
      //use ptr
    
    
      free_function(ptr);

    ArrayHandleTransform works with ExecObject

    Previously, the ArrayHandleTransform class only worked with plain old data (POD) objects as is functors. For simple transforms, this makes sense since all the data comes from a target ArrayHandle that will be sent to the device through a different path. However, this also requires the transform to be known at compile time.

    However, there are cases where the functor cannot be a POD object and has to be built for a specific device. There are numerous reasons for this. One might be that you need some lookup tables. Another might be you want to support a virtual object, which has to be initialized for a particular device. The standard way to implement this in VTK-m is to create an "executive object." This actually means that we create a wrapper around executive objects that inherits from vtkm::cont::ExecutionAndControlObjectBase that contains a PrepareForExecution method and a PrepareForControl method.

    As an example, consider the use case of a special ArrayHandle that takes the value in one array and returns the index of that value in another sorted array. We can do that by creating a functor that finds a value in an array and returns the index.

    template <typename ArrayPortalType>
    struct FindValueFunctor
    {
      ArrayPortalType SortedArrayPortal;
    
      FindValueFunctor() = default;
    
      VTKM_CONT FindValueFunctor(const ArrayPortalType& sortedPortal)
        : SortedArrayPortal(sortedPortal)
      { }
    
      VTKM_EXEC vtkm::Id operator()(const typename PortalType::ValueType& value)
      {
        vtkm::Id leftIndex = 0;
      vtkm::Id rightIndex = this->SortedArrayPortal.GetNubmerOfValues();
      while (leftIndex < rightIndex)
      {
        vtkm::Id middleIndex = (leftIndex + rightIndex) / 2;
        auto middleValue = this->SortedArrayPortal.Get(middleIndex);
        if (middleValue <= value)
        {
          rightIndex = middleValue;
        }
        else
        {
          leftIndex = middleValue + 1;
        }
      }
      return leftIndex;
      }
    };

    Simple enough, except that the type of ArrayPortalType depends on what device the functor runs on (not to mention its memory might need to be moved to different hardware). We can now solve this problem by creating a functor objecgt set this up for a device. ArrayHandles also need to be able to provide portals that run in the control environment, and for that we need a special version of the functor for the control environment.

    template <typename ArrayHandleType>
    struct FindValueExecutionObject : vtkm::cont::ExecutionAndControlObjectBase
    {
      VTKM_IS_ARRAY_HANDLE(ArrayHandleType);
    
      ArrayHandleType SortedArray;
    
      FindValueExecutionObject() = default;
    
      VTKM_CONT FindValueExecutionObject(const ArrayHandleType& sortedArray)
        : SortedArray(sortedArray)
      { }
    
      template <typename Device>
      VTKM_CONT
      FindValueFunctor<decltype(std::declval<FunctorType>()(Device()))>
      PrepareForExecution(Device device)
      {
        using FunctorType =
        FindValueFunctor<decltype(std::declval<FunctorType>()(Device()))>
    
        return FunctorType(this->SortedArray.PrepareForInput(device));
      }
    
      VTKM_CONT
      FundValueFunctor<typename ArrayHandleType::PortalConstControl>
      PrepareForControl()
      {
        using FunctorType =
        FindValueFunctor<typename ArrayHandleType::PortalConstControl>
    
      return FunctorType(this->SortedArray.GetPortalConstControl());
      }
    }

    Now you can use this execution object in an ArrayHandleTransform. It will automatically be detected as an execution object and be converted to a functor in the execution environment.

    auto transformArray =
      vtkm::cont::make_ArrayHandleTransform(
        inputArray, FindValueExecutionObject<decltype(sortedArray)>(sortedArray));

    Add ArrayHandleView fancy array

    Added a new class named ArrayHandleView that allows you to get a subset of an array. You use the ArrayHandleView by giving it a target array, a starting index, and a length. Here is a simple example of usage:

    vtkm::cont::ArrayHandle<vtkm::Id> sourceArray;
    
    vtkm::cont::ArrayCopy(vtkm::cont::ArrayHandleIndex(10), sourceArray);
    // sourceArray has [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    
    vtkm::cont::ArrayHandleView<vtkm::cont::ArrayHandle<vtkm::Id>>
      viewArray(sourceArray, 3, 5);
    // viewArray has [3, 4, 5, 6, 7]

    There is also a convenience make_ArraHandleView function to create view arrays. The following makes the same view array as before.

    auto viewArray = vtkm::cont::make_ArrayHandleView(sourceArray, 3, 5);

    ArrayHandleCompositeVector simplified and made writable.

    ArrayHandleCompositeVector is now easier to use, as its type has a more straightforward definition: ArrayHandleCompositeVector<Array1, Array2, ...>. Previously, a helper metaprogramming struct was needed to determine the type of the array handle.

    In addition, the new implementation supports both reading and writing, whereas the original version was read-only.

    Another notable change is that the ArrayHandleCompositeVector no longer supports component extraction from the source arrays. While the previous version could take a source array with a vtkm::Vec ValueType and use only a single component in the output, the new version requires that all input arrays have the same ValueType, which becomes the ComponentType of the output vtkm::Vec.

    When component extraction is needed, the classes ArrayHandleSwizzle and ArrayHandleExtractComponent have been introduced to allow the previous usecases to continue working efficiently.

    ArrayHandleExtractComponent target component is now set at runtime

    Rather than embedding the extracted component in a template parameter, the extract operation is now defined at runtime.

    This is easier to use and keeps compile times / sizes / memory requirements down.

    ArrayHandleSwizzle component maps are now set at runtime

    Rather than embedding the component map in the template parameters, the swizzle operation is now defined at runtime using a vtkm::Vec<vtkm::IdComponent, N> that maps the input components to the output components.

    This is easier to use and keeps compile times / sizes / memory requirements down.

    Control Environment

    Interfaces for VTK-m spatial search structures added

    The objective for this feature was to add a common interface for the VTK-m spatial search strucutes for ease of use for the users. VTK-m now distinguishes locators into two types, cell locators and point locators. Cell locators can be used to query a containing cell for a point, and point locators can be used to search for other points that are close to the given point.

    All cell locators are now required to inherit from the interface vtkm::cont::CellLocator, and all point locatos are required to inherit from the interface vtkm::cont::PointLocator

    These interfaces describe the necessary features that are required from either a cell locator, or a point locator and provided an easy way to use them in the execution environment.

    By deriving new search structures from these locator interfaces, it makes it easier for users to build the underlying structures as well, abstracting away complicated details. After providing all the required data from a vtkm::cont::DataSet object, the user only need to call the Update method on the object of vtkm::cont::CellLocator, or vtkm::cont::PointLocator.

    For example, building the cell locator which used a Bounding Interval Hiererchy tree as a search structure, provided in the class vtkm::cont::BoundingIntervalHierarchy which inherits from vtkm::cont::CellLocator, only requires few steps.

      // Build a bounding interval hierarchy with 5 splitting planes,
      // and a maximum of 10 cells in the leaf node.
      vtkm::cont::BoundingIntervalHierarchy locator(5, 10);
      // Provide the cell set required by the search structure.
      locator.SetCellSet(cellSet);
      // Provide the coordinate system required by the search structure.
      locator.SetCoordinates(coords);
      // Cell the Update methods to finish building the underlying tree.
      locator.Update();

    Similarly, users can easily build available point locators as well.

    When using an object of vtkm::cont::CellLocator, or vtkm::cont::PointLocator in the execution environment, they need to be passed to the worklet as an ExecObject argument. In the execution environment, users will receive a pointer to an object of type vtkm::exec::CellLocator, or vtkm::exec::PointLocator respectively. vtkm::exec::CellLocator provides a method FindCell to use in the execution environment to query the containing cell of a point. vtkm::exec::PointLocator provides a method FindNearestNeighbor to query for the nearest point.

    As of now, VTK-m provides only one implementation for each of the given interfaces. vtkm::cont::BoundingIntervalHierarchy which is an implementation of vtkm::cont::CellLocator, and vtkm::cont::PointLocatorUniformGrid, which is an implementation of vtkm::cont::PointLocator.

    vtkm::cont::Algorithm now can be told which device to use at runtime

    The vtkm::cont::Algorithm has been extended to support the user specifying which device to use at runtime previously Algorithm would only use the first enabled device, requiring users to modify the vtkm::cont::GlobalRuntimeDeviceTracker if they wanted a specific device used.

    To select a specific device with vtkm::cont::Algorithm pass the vtkm::cont::DeviceAdapterId as the first parameter.

    vtkm::cont::ArrayHandle<double> values;
    
    //call with no tag, will run on first enabled device
    auto result = vtkm::cont::Algorithm::Reduce(values, 0.0);
    
    //call with an explicit device tag, will only run on serial
    vtkm::cont::DeviceAdapterTagSerial serial;
    result = vtkm::cont::Algorithm::Reduce(serial, values, 0.0);
    
    //call with an runtime device tag, will only run on serial
    vtkm::cont::DeviceAdapterId device = serial;
    result = vtkm::cont::Algorithm::Reduce(device, values, 0.0);
    

    Support ExecArg behavior in vtkm::cont::Algorithm methods

    vtkm::cont::Algorithm is a wrapper around DeviceAdapterAlgorithm that internally uses TryExecutes to select an appropriate device. The intention is that you can run parallel algorithms (outside of worklets) without having to specify a particular device.

    Most of the arguments given to device adapter algorithms are actually control-side arguments that get converted to execution objects internally (usually a vtkm::cont::ArrayHandle). However, some of the algorithms, take an argument that is passed directly to the execution environment, such as the predicate argument of Sort. If the argument is a plain-old-data (POD) type, which is common enough, then you can just pass the object straight through. However, if the object has any special elements that have to be transferred to the execution environment, such as internal arrays, passing this to the vtkm::cont::Algorithm functions becomes problematic.

    To cover this use case, all the vtkm::cont::Algorithm functions now support automatically transferring objects that support the ExecObject worklet convention. If any argument to any of the vtkm::cont::Algorithm functions inherits from vtkm::cont::ExecutionObjectBase, then the PrepareForExecution method is called with the device the algorithm is running on, which allows these device-specific objects to be used without the hassle of creating a TryExecute.

    vtkm::cont::TryExecuteOnDevice allows for runtime selection of which device to execute on

    VTK-m now offers vtkm::cont::TryExecuteOnDevice to allow for the user to select which device to execute a function on at runtime. The original vtkm::cont::TryExecute used the first valid device, which meant users had to modify the runtime state through the RuntimeTracker which was verbose and unwieldy.

    Here is an example of how you can execute a function on the device that an array handle was last executed on:

    
    struct ArrayCopyFunctor
    {
      template <typename Device, typename InArray, typename OutArray>
      VTKM_CONT bool operator()(Device, const InArray& src, OutArray& dest)
      {
        vtkm::cont::DeviceAdapterAlgorithm<Device>::Copy(src, dest);
        return true;
      }
    };
    
    template<typename T, typename InStorage, typename OutStorage>
    void SmartCopy(const vtkm::cont::ArrayHandle<T, InStorage>& src, vtkm::cont::ArrayHandle<T, OutStorage>& dest)
    {
      bool success = vtkm::cont::TryExecuteOnDevice(devId, ArrayCopyFunctor(), src, dest);
      if (!success)
      {
        vtkm::cont::TryExecute(ArrayCopyFunctor(), src, dest);
      }
    }

    vtkm::cont::CellSetExplicit now caches CellToPoint table when used with Invoke

    Issue #268 highlighted an issue where the expensive CellToPoint table update was not properly cached when a CellSetExplicit was used with a filter. This has been corrected by ensuring that the metadata associated with the table survives shallow copying of the CellSet.

    New methods are also added to check whether the CellToPoint table exists, and also to reset it if needed (e.g. for benchmarking):

    vtkm::cont::CellSetExplicit<> cellSet = ...;
    // Check if the CellToPoint table has already been computed:
    if (cellSet.HasConnectivity(vtkm::TopologyElementTagCell{},
                                vtkm::TopologyElementTagPoint{}))
    {
      // Reset it:
      cellSet.ResetConnectivity(vtkm::TopologyElementTagCell{},
                                vtkm::TopologyElementTagPoint{});
    }

    vtkm::cont::RuntimeDeviceInformation doesn't require a device template

    By making RuntimeDeviceInformation class template independent, vtkm is able to detect device info at runtime with a runtime specified deviceId. In the past it's impossible because the CRTP pattern does not allow function overloading(compiler would complain that DeviceAdapterRuntimeDetector does not have Exists() function defined).

    vtkm::cont::VirtualObjectHandle can transfer to a device using runtime DeviceAdapterId value

    Previously VirtualObjectHandle required the caller to know a compile time device adapter tag to transfer data. This was problematic since in parts of VTK-m you would only have the runtime vtkm::cont::DeviceAdapterId value of the desired device. To than transfer the VirtualObjectHandle you would have to call FindDeviceAdapterTagAndCall. All this extra work was unneeded as VirtualObjectHandle internally was immediately converting from a compile time type to a runtime value.

    Here is an example of how you can now transfer a VirtualObjectHandle to a device using a runtime value:

    
    template<typename BaseType>
    const BaseType* moveToDevice(VirtualObjectHandle<BaseType>& handle,
                          vtkm::cont::vtkm::cont::DeviceAdapterId deviceId)
    {
      return handle.PrepareForExecution(deviceId);
    }

    Add new vtkm::exec and vtkm::cont execution objects

    Recent changes to execution objects now have execution objects behave as factories that create an objec specific for a particular device. Sometimes, you also need to be able to get an object that behaves properly in the control environment. For these cases, a sublcass to vtkm::cont::ExecutionObjectBase was created.

    This subclass is called vtkm::cont::ExecutionAndControlObjectBase. In addition to the PrepareForExecution method required by its superclass, these objects also need to provide a PrepareForControl method to get an equivalent object that works in the control environment.

    See the changelog for ArrayHandleTransform works with ExecObject for an example of using a vtkm::cont::ExecutionAndControlObjectBase.

    Added a ReleaseResources API to CellSet and its derived classes

    We now offer the ability to unload execution memory from vtkm::cont::CellSet and its derived classes(CellSetExplicit, CellSetPermutation and CellSetStructured) using the ReleaseResourcesExecution.

    Added a ReleaseResourcesExecution API for Field to unload execution resources

    We now offer the ability to unload execution memory from vtkm::cont::Field using the ReleaseResourcesExecution method.

    Added a ReleaseResourcesExecution API for CoordinateSystem to unload execution resources

    We now offer the ability to unload execution memory from vtkm::cont::ArrayHandleVirtualCoordinates and vtkm::cont::CoordinateSystem using the ReleaseResourcesExecution method.

    Use the strong typed enums for vtkm::cont::Field

    By doing so, the compiler would not convert these enums into ints which can cause some unexpected behavior.

    vtkm::cont::DeviceAdapterId has becomes a real constexpr type and not an alias to vtkm::UInt8

    As part of the ability to support vtkm::cont::TryExecuteOnDevice VTK-m has made the DeviceAdapterId a real constexpr type instead of a vtkm::UInt8.

    The benefits of a real type are as follows:

    • Easier to add functionality like range verification, which previously had to be located in each user of DeviceAdapterId

    • In ability to have ambiguous arguments. Previously it wasn't perfectly clear what a method parameter of vtkm::UInt8 represented. Was it actually the DeviceAdapterId or something else?

    • Ability to add subclasses that represent things such as Undefined, Error, or Any.

    The implementation of DeviceAdapterId is:

    struct DeviceAdapterId
    {
      constexpr explicit DeviceAdapterId(vtkm::Int8 id)
        : Value(id)
      {
      }
    
      constexpr bool operator==(DeviceAdapterId other) const { return this->Value == other.Value; }
      constexpr bool operator!=(DeviceAdapterId other) const { return this->Value != other.Value; }
      constexpr bool operator<(DeviceAdapterId other) const { return this->Value < other.Value; }
    
      constexpr bool IsValueValid() const
      {
        return this->Value > 0 && this->Value < VTKM_MAX_DEVICE_ADAPTER_ID;
      }
    
      constexpr vtkm::Int8 GetValue() const { return this->Value; }
    
    private:
      vtkm::Int8 Value;
    };

    Execution Environment

    User defined execution objects now usable with runtime selection of device adapter

    Changed how Execution objects are created and passed from the cont environment to the exec environment. Instead we will now fill out a class and call PrepareForExecution() and create the execution object for the exec environment from this function. This way we do not have to template the class that extends vtkm::cont::ExecutionObjectBase on the device.

    Example of new execution object:

    template <typename Device>
    struct ExecutionObject
    {
      vtkm::Int32 Number;
    };
    
    struct TestExecutionObject : public vtkm::cont::ExecutionObjectBase
    {
      vtkm::Int32 Number;
    
      template <typename Device>
      VTKM_CONT ExecutionObject<Device> PrepareForExecution(Device) const
      {
        ExecutionObject<Device> object;
        object.Number = this->Number;
        return object;
      }
    };

    Dot function name changed

    The free function vtkm::dot() has been renamed to vtkm::Dot() to be consistent with other vtk-m function names. Aliases are provided for backwards compatibility but will be removed in the next release.

    Added float version operations for vtkm::Math Pi()

    vtkm::Pi<T> now suports float and double as T.

    vtkm::Math Pi functions are now constexpr

    Now PI related functions are evalulated at compile time as constexpr functions.

    CellDerivativeFor3DCell has a better version for Vec of Vec fields.

    Previously we would compute a 3x3 matrix where each element was a Vec. Using the jacobain of a single component is sufficient instead of computing it for each component. This approach saves anywhere from 2 to 3 times the memory space.

    Add a CUDA-safe vtkm::Swap method

    Added a swap implementation that is safe to call from all backends.

    It is not legal to call std functions from CUDA code, and the new vtkm::Swap implements a naive swap when compiled under NVCC while falling back to a std/ADL swap otherwise.

    Worklets and Filters

    Worklets are now asynchronous on CUDA

    Worklets are now fully asynchronous in the CUDA backend. This means that worklet errors are reported asynchronously. Existing errors are checked for before invocation of a new worklet and at explicit synchronization points like DeviceAdapterAlgorithm<>::Synchronize().

    An important effect of this change is that functions that are synchronization points, like ArrayHandle::GetPortalControl() and ArrayHandle::GetPortalConstControl(), may now throw exception for errors from previously executed worklets.

    Worklet invocations, synchronization and error reporting happen independtly on different threads. Therefore, synchronization on one thread does not affect any other threads.

    Worklets now execute on CUDA using grid stride loops

    Previously VTK-m Worklets used what is referred to as a monolithic kernel pattern for worklet execution. This assumes a single large grid of threads to process an entire array in a single pass. This resulted in launches that looked like:

    template<typename F>
    void TaskSingular(F f, vtkm::Id end)
    {
      const vtkm::Id index = static_cast<vtkm::Id>(blockDim.x * blockIdx.x + threadIdx.x);
      if (index < end)
      {
        f(index);
      }
    }
    
    Schedule1DIndexKernel<TaskSingular><<<totalBlocks, 128, 0, CUDAStreamPerThread>>>(
           functor, numInstances);

    This was problematic as it had the drawbacks of:

    • Not being able to reuse any infrastructure between kernel executions.
    • Harder to tune performance based on the current hardware.

    The solution was to move to a grid stride loop strategy with a block size based off the number of SM's on the executing GPU. The result is something that looks like:

    template<typename F>
    void TaskStrided(F f, vtkm::Id end)
    {
      const vtkm::Id start = blockIdx.x * blockDim.x + threadIdx.x;
      const vtkm::Id inc = blockDim.x * gridDim.x;
      for (vtkm::Id index = start; index < end; index += inc)
      {
        f(index);
      }
    }
    Schedule1DIndexKernel<TaskStrided><<<32*numSMs, 128, 0, CUDAStreamPerThread>>>(
           functor, numInstances);

    With a loop stride equal to grid size we maintain the optimal memory coalescing patterns as we had with the monolithic version. These changes also allow VTK-m to optimize TaskStrided so that it can reuse infrastructure between iterations.

    Scatter class moved to dispatcher

    Scatter classes are special objects that are associated with a worklet to adjust the standard 1:1 mapping of input to output in the worklet execution to some other mapping with multiple outputs to a single input or skipping over input values. A classic use case is the Marching Cubes algorithm where cube cases will have different numbers of output. A scatter object allows you to specify for each output polygon which source cube it comes from.

    Scatter objects have been in VTK-m for some time now (since before the 1.0 release). The way they used to work is that the worklet completely managed the scatter object. It would declare the ScatterType, keep a copy as part of its state, and provide a GetScatter method so that the dispatcher could use it for scheduling.

    The problem with this approach is that it put control-environment-specific state into the worklet. The scatter object would be pushed into the execution environment (like a CUDA device) like the rest of the worklet where it could not be used. It also meant that worklets that defined their own scatter had to declare a bunch more code to manage the scatter.

    This behavior has been changed so that the dispatcher object manages the scatter object. The worklet still declares the type of scatter by declaring a ScatterType (defaulting to ScatterUniform for 1:1 mapping), but its responsibility ends there. When the dispatcher is constructed, it must be given a scatter object that matches the ScatterType of the associated worklet. (If ScatterType has a default constructor, then one can be created automatically.) A worklet may declare a static MakeScatter method for convenience, but this is not necessary.

    As an example, a worklet may declare a custom scatter like this.

      class Generate : public vtkm::worklet::WorkletMapField
      {
      public:
        typedef void ControlSignature(FieldIn<Vec3> inPoints,
                                      FieldOut<Vec3> outPoints);
        typedef void ExecutionSignature(_1, _2);
        using InputDomain = _1;
    
        using ScatterType = vtkm::worklet::ScatterCounting;
    
        template<typename CountArrayType, typename DeviceAdapterTag>
        VTKM_CONT
        static ScatterType MakeScatter(const CountArrayType &countArray,
                                       DeviceAdapterTag)
        {
          VTKM_IS_ARRAY_HANDLE(CountArrayType);
          return ScatterType(countArray, DeviceAdapterTag());
        }

    Note that the ScatterCounting needs to be created with the appropriate indexing arrays to make the scatter behave as the worklet expects, so the worklet provides a helpful MakeScatter method to make it more clear how to construct the scatter.

    This worklet can be invoked as follows.

        auto generateScatter =
            ClipPoints::Generate::MakeScatter(countArray, DeviceAdapterTag());
        vtkm::worklet::DispatcherMapField<ClipPoints::Generate, DeviceAdapterTag>
            dispatcherGenerate(generateScatter);
        dispatcherGenerate.Invoke(pointArray, clippedPointsArray);

    Because the ScatterCounting class does not have a default constructor, you would get a compiler error if you failed to provide one to the dispatcher's constructor. The compiler error will probably not be too helpful the the user, but there is a detailed comment in the dispatcher's code where the compiler error will occur describing what the issue is.

    VTK-m now supports dispatcher parameters being pointers

    Previously it was only possible to pass values to a dispatcher when you wanted to invoke a VTK-m worklet. This caused problems when it came to designing new types that used inheritance as the types couldn't be past as the base type to the dispatcher. To fix this issue we now support invoking worklets with pointers as seen below.

      vtkm::cont::ArrayHandle<T> input;
      //fill input
    
      vtkm::cont::ArrayHandle<T> output;
      vtkm::worklet::DispatcherMapField<WorkletType> dispatcher;
    
      dispatcher(&input, output);
      dispatcher(input, &output);
      dispatcher(&input, &output);

    Histogram filter supports custom types

    By passing TypeList and StorageList type into FieldRangeGlobalCompute, upstream users(VTK) can pass custom types into the histogram filter.

    Cell measure functions, worklet, and filter

    VTK-m now provides free functions, a worklet, and a filter for computing the integral measure of a cell (i.e., its arc length, area, or volume).

    The free functions are located in vtkm/exec/CellMeasure.h and share the same signature:

      template<typename OutType, typename PointVecType>
      OutType CellMeasure(
        const vtkm::IdComponent& numPts,
        const PointCoordVecType& pts,
        CellShapeTag,
        const vtkm::exec::FunctorBase& worklet);

    The number of points argument is provided for cell-types such as lines, which allow an arbitrary number of points per cell. See the worklet for examples of their use.

    The worklet is named vtkm::worklet::CellMeasure and takes a template parameter that is a tag list of measures to include. Cells that are not selected by the tag list return a measure of 0. Some convenient tag lists are predefined for you:

    • vtkm::ArcLength will only compute the measure of cells with a 1-dimensional parameter-space.
    • vtkm::Area will only compute the measure of cells with a 2-dimensional parameter-space.
    • vtkm::Volume will only compute the measure of cells with a 3-dimensional parameter-space.
    • vtkm::AllMeasures will compute all of the above.

    The filter version, named vtkm::filter::CellMeasures – plural since it produces a cell-centered array of measures — takes the same template parameter and tag lists as the worklet. By default, the output array of measure values is named "measure" but the filter accepts other names via the SetCellMeasureName() method.

    The only cell type that is not supported is the polygon; you must triangulate polygons before running this filter. See the unit tests for examples of how to use the worklet and filter.

    The cell measures are all signed: negative measures indicate that the cell is inverted. Simplicial cells (points, lines, triangles, tetrahedra) cannot not be inverted by definition and thus always return values above or equal to 0.0. Negative values indicate either the order in which vertices appear in its connectivity array is improper or the relative locations of the vertices in world coordinates result in a cell with a negative Jacobian somewhere in its interior. Finally, note that cell measures may return invalid (NaN) or infinite (Inf, -Inf) values if the cell is poorly defined, e.g., has coincident vertices or a parametric dimension larger than the space spanned by its world-coordinate vertices.

    The verdict mesh quality library was used as the source of the methods for approximating the cell measures.

    Add a WaveletGenerator worklet (e.g. vtkRTAnalyticSource)

    Add a VTK-m implementation of VTK's vtkRTAnalyticSource, or "Wavelet" source as it is known in ParaView. This is a customizable dataset with properties that make it useful for testing and benchmarking various algorithms.

    Add a filter to support Lagrangian analysis capabilities

    Lagrangian analysis operates in two phases - phase one involes the extraction of flow field information. Phase two involves calculating new particle trajectories using the saved information.

    The lagrangian filter can be used to extract flow field information given a time-varying vector fields. The extracted information is in the form of particle trajectories.

    The filter operates by first being set up with some information regarding step size, the interval at which information should be saved (write frequency), the number of seeds to be placed in the domain (specified as a reduction factor along each axis of the original dimensions). The step size should be equivalent to the time between vector field data input. The write frequency corresponds to the number of cycles between saves.

    Filter execution is called for each cycle of the simulation data. Each filter execution call requires a velocity field to advect particles forward.

    The extracted particle trajectories - referred to as basis flows exist in the domain for the specified interval (write frequency). Particles are then reset along a uniform grid and new particle trajectories are calculated.

    An example of using the Lagrangian filter is at vtk-m/examples/lagrangian The basis flows are saved into a folder named output which needs to be created in the directory in which the program is being executed.

    The basis flows can be interpolated using barycentric coordinate interpolation or a form of linear interpolation to calculate new particle trajectories post hoc.

    An example of using basis flows generated by the Lagrangian filter is at vtk-m/examples/posthocinterpolation. The folder contains a script which specifies parameters which need to be provided to use the example.

    Add a warp vector worklet and filter

    This commit adds a worklet that modifies point coordinates by moving points along point normals by the scalar amount. It's a simplified version of the vtkWarpScalar in VTK. Additionally the filter doesn't modify the point coordinates, but creates a new point coordinates that have been warped. Useful for showing flow profiles or mechanical deformation.

    Add a warp scalar worklet and filter

    This commit adds a worklet as well as a filter that modify point coordinates by moving points along point normals by the scalar amount times the scalar factor. It's a simplified version of the vtkWarpScalar class in VTK. Additionally the filter doesn't modify the point coordinates, but creates a new point coordinates that have been warped.

    Add a split sharp edge filter

    It's a filter that splits sharp manifold edges where the feature angle between the adjacent surfaces are larger than the threshold value. When an edge is split, it would add a new point to the coordinates and update the connectivity of an adjacent surface. Ex. There are two adjacent triangles(0,1,2) and (2,1,3). Edge (1,2) needs to be split. Two new points 4(duplication of point 1) an 5(duplication of point 2) would be added and the later triangle's connectivity would be changed to (5,4,3). By default, all old point's fields would be copied to the new point. Use with caution.

    Time-varying "oscillator" filter and example

    The oscillator is a simple analytical source of time-varying data. It provides a function value at each point of a uniform grid that is computed as a sum of Gaussian kernels — each with a specified position, amplitude, frequency, and phase.

    The example (in examples/oscillator) generates volumetric Cinema datasets that can be viewed in a web browser with ArcticViewer.

    Build

    Build System Redesign and new minimum CMake

    VTK-m CMake buildsystem was redesigned to be more declarative for consumers. This was done by moving away from the previous component design and instead to explicit targets. Additionally VTK-m now uses the native CUDA support introduced in CMake 3.8 and has the following minimum CMake versions:

    • Visual Studio Generator requires CMake 3.11+
    • CUDA support requires CMake 3.9+
    • OpenMP support requires CMake 3.9+
    • Otherwise CMake 3.3+ is supported

    When VTK-m is found find_package it defines the following targets:

    • vtkm_cont

      • contains all common core functionality
      • always exists
    • vtkm_rendering

      • contains all the rendering code
      • exists only when rendering is enabled
      • rendering also provides a vtkm_find_gl function
        • allows you to find the GL (EGL,MESA,Hardware), GLUT, and GLEW versions that VTK-m was built with.

    VTK-m also provides targets that represent what device adapters it was built to support. The pattern for these targets are vtkm::<device>. Currently we don't provide a target for the serial device.

    • vtkm::tbb

      • Target that contains tbb related link information implicitly linked to by vtkm_cont if tbb was enabled
    • vtkm::openmp

      • Target that contains openmp related link information implicitly linked to by vtkm_cont if openmp was enabled
    • vtkm::CUDA

      • Target that contains CUDA related link information implicitly linked to by vtkm_cont if CUDA was enabled

    VTK-m can be built with specific CPU architecture vectorization/optimization flags. Consumers of the project can find these flags by looking at the vtkm_vectorization_flags target.

    So a project that wants to build an executable that uses vtk-m would look like:

    
    cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
    project(HellowWorld CXX)
    
    #Find the VTK-m package.
    #Will automatically enable the CUDA language if needed ( and bump CMake minimum )
    
    find_package(VTKm REQUIRED)
    
    add_executable(HelloWorld HelloWorld.cxx)
    target_link_libraries(HelloWorld PRIVATE vtkm_cont)
    
    if(TARGET vtkm::CUDA)
      set_source_files_properties(HelloWorld.cxx PROPERTIES LANGUAGE CUDA)
    endif()
    

    Add none option to VTKm_CUDA_Architecture

    A new VTKm_CUDA_Architecture option called none has been added. This will disable all VTK-m generated CUDA architecture flags, allowing the user to specify their own custom flags.

    Useful when VTK-m is used as a library in another project and the project wants to use its own architecture flags.

    Add a new cmake option: VTKm_ENABLE_DEVELOPER_FLAGS

    The new cmake option VTKm_ENABLE_DEVELOPER_FLAGS can be used to enable/disable warnings in VTK-m. It is useful to disable VTK-m's warning flags when VTK-m is directly embedded by a project as sub project (add_subdirectory), and the warnings are too strict for the project. This does not apply when using an installed version of VTK-m.

    For example, this flag is disabled in VTK.

    This flag is enabled by default.

    Add a new cmake option: VTKm_INSTALL_ONLY_LIBRARIES

    The new cmake option VTKm_INSTALL_ONLY_LIBRARIES when enabled will cause VTK-m to only install libraries. This is useful for projects that are producing an application and don't want to ship headers or CMake infrastructure.

    For example, this flag is enabled by ParaView for releases.

    This flag is disabled by default.

    Other

    Control CUDA managed memory with an environment variable

    By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1, users are able to disable CUDA managed memory even though the hardware is capable of doing so.

    Add support for deferred freeing of CUDA memory

    A new function, void CudaAllocator::FreeDeferred(void* ptr, std::size_t numBytes) has been added that can be used to defer the freeing of CUDA memory to a later point. This is useful because cudaFree causes a global sync across all CUDA streams. This function internally maintains a pool of to-be-freed pointers that are freed together when a size threshold is reached. This way a number of global syncs are collected together at one point.

    Allow variable arguments to VTKM_TEST_ASSERT

    The VTKM_TEST_ASSERT macro is a very useful tool for performing checks in tests. However, it is rather annoying to have to always specify a message for the assert. Often the failure is self evident from the condition (which is already printed out), and specifying a message is both repetative and annoying.

    Also, it is often equally annoying to print out additional information in the case of an assertion failure. In that case, you have to either attach a debugger or add a printf, see the problem, and remove the printf.

    This change solves both of these problems. VTKM_TEST_ASSERT now takes a condition and a variable number of message arguments. If no message arguments are given, then a default message (along with the condition) are output. If multiple message arguments are given, they are appended together in the result. The messages do not have to be strings. Any object that can be sent to a stream will be printed correctly. This allows you to print out the values that caused the issue.

    So the old behavior of VTKM_TEST_ASSERT still works. So you can have a statement like

    VTKM_TEST_ASSERT(array.GetNumberOfValues() != 0, "Array is empty");

    As before, if this assertion failed, you would get the following error message.

    Array is empty (array.GetNumberOfValues() != 0)

    However, in the statement above, you may feel that it is self evident that array.GetNumberOfValues() == 0 means the array is empty and you have to type this into your test, like, 20 times. You can save yourself some work by dropping the message.

    VTKM_TEST_ASSERT(array.GetNumberOfValues() != 0);

    In this case if the assertion fails, you will get a message like this.

    Test assertion failed (array.GetNumberOfValues() != 0)

    But perhaps you have the opposite problem. Perhaps you need to output more information. Let's say that you expected a particular operation to half the length of an array. If the operation fails, it could be helpful to know how big the array actually is. You can now actually output that on failure by adding more message arguments.

    VTKM_TEST_ARRAY(outarray.GetNumberOfValues() == inarrayGetNumberOfValues()/2,
                    "Expected array size ",
            inarrayGetNumberOfValues()/2,
            " but got ",
            outarray.GetNumberOfValues());

    In this case, if the test failed, you might get an error like this.

    Expected array size 5 but got 6 (outarray.GetNumberOfValues() == inarrayGetNumberOfValues()/2)

    Support constexpr and variadic constructor for Vec

    Add variadic constructors to the vtkm::Vec classes. The main advantage of this addition is that it makes it much easier to initialize Vecs of arbitrary length.

    Meanwhile, Vec classes constructed with values listed in their parameters up to size 4 are constructed as constant expressions at compile time to reduce runtime overhead. Sizes greater than 4 are not yet supported to be constructed at compile time via initializer lists since in C++11 constexpr does not allow for loops. Only on Windows platform with a compiler older than Visual Studio 2017 version 15.0, users are allowed to use initializer lists to construct a vec with size > 4.

    vtkm::make_Vec would always construct Vec at compile time if possible.

    vtkm::Vec<vtkm::Float64, 3> vec1{1.1, 2.2, 3.3};  // New better initializer since
                                                      // it does not allow type narrowing
    
    vtkm::Vec<vtkm::Float64, 3> vec2 = {1.1, 2.2, 3.3}; // Nice syntax also supported by
                                                        // initializer lists.
    
    vtkm::Vec<vtkm::Float64, 3> vec3 = vtkm::make_Vec(1.1, 2.2, 3.3); // Old style that still works.
    
    vtkm::Vec<vtkm::Float64, 3> vec3(1.1, 2.2, 3.3); // Old style that still works but
                                                     // should be deprecated. Reason listed below.

    Nested initializer lists work to initialize Vec of Vecs. If the size is no more than 4, it's always constructed at compile time if possible.

    vtkm::Vec<vtkm::Vec<vtkm::Float64, 2>, 3> vec{ {1.1, 2.2}, {3.3, 4.4}, {5.5, 6.6} };
                                                               //Constructed at compile time

    One drawback about the std::initializer_list implementation is that it constructs larger Vec(size>4) of scalars or vecs at run time.

    vtkm::Vec<vtkm::Float64, 5> vec1{1.1, 2.2, 3.3, 4.4, 5.5}; // Constructed at run time.
    
    vtkm::Vec<vtkm::Vec<vtkm::Float64, 2>, 5> vec2{{1.1, 1.1},{2.2, 2.2},{3.3, 3.3},
                                          {4.4, 4.4}, {5.5, 5.5}}; // Constructed at run time.

    Parenthesis constructor would report an error if the size is larger than 4 when being used to construct a Vec of Vecs. If it's being used to construct a Vec of scalars then it's fine.

    vtkm::Vec<vtkm::Vec<vtkm::Float64, 2>, 5> vec2({1.1, 1.1},{2.2, 2.2},{3.3, 3.3},
                                          {4.4, 4.4}, {5.5, 5.5}); // ERROR! This type of
                                                                   // constructor not implemented!
    
    vtkm::Vec<vtkm::Float64, 5> vec1(1.1, 2.2, 3.3, 4.4, 5.5); // Constructed at compile time.

    If a vtkm::Vec is initialized with a list of size one, then that one value is replicated for all components.

    vtkm::Vec<vtkm::Float64, 3> vec{1.1};  // vec gets [ 1.1, 1.1, 1.1 ]

    This "scalar" initialization also works for Vec of Vecs.

    vtkm::Vec<vtkm::Vec<vtkm::Float64, 2>, 3> vec1{ { 1.1, 2.2 } };
    // vec1 is [[1.1, 2.2], [1.1, 2.2], [1.1, 2.2]]
    
    vtkm::Vec<vtkm::Vec<vtkm::Float64, 2>, 3> vec2{ { 3.3}, { 4.4 }, { 5.5 } };
    // vec2 is [[3.3, 3.3], [4.4, 4.4], [5.5, 5.5]]

    vtkm::make_Vec is also updated to support an arbitrary number initial values which are constructed at compile time.

    // Creates a vtkm::Vec<vtkm::Float64, 5>
    auto vec = vtkm::make_Vec(1.1, 2.2, 3.3, 4.4, 5.5);

    This is super convenient when dealing with variadic function arguments.

    template <typename... Ts>
    void ExampleVariadicFunction(const Ts&... params)
    {
      auto vec = vtkm::make_Vec(params...);

    Of course, this assumes that the type of all the parameters is the same. If not, you could run into compiler trouble.

    vtkm::make_Vec does not accept an std::initializer_list,

    // Creates a vtkm::Vec<vtkm::Float64, 3>
    auto vec1 = vtkm::make_Vec<3>({1.1, 2.2, 3.3}); // ERROR
    
    // Creates exactly the same thing but compiles
    auto vec1 = vtkm::make_Vec<3>(1.1, 2.2, 3.3);

    A limitation of the initializer list constructor is that the compiler has no way to check the length of the list or force it to a particular length. Thus it is entirely possible to construct a Vec with the wrong number of arguments. Or, more to the point, the compiler will let you do it, but there is an assert in the constructor to correct for that. (Of course, asserts are not compiled in release builds.)

    // This will compile, but it's results are undefined when it is run.
    // In debug builds, it will fail an assert.
    vtkm::Vec<vtkm::Float64, 3> vec{1.1, 1.2};

    vtkm::Vec< vtkm::Vec<T> > can't be constructed from vtkm::Vec<U>

    When you have a Vec<Vec<float,3>> it was possible to incorrectly initialize it with the contents of a Vec<double,3>. An example of this is:

    using Vec3d = vtkm::Vec<double, 3>;
    using Vec3f = vtkm::Vec<float, 3>;
    using Vec3x3f = vtkm::Vec<Vec3f, 3>;
    
    Vec3d x(0.0, 1.0, 2.0);
    Vec3x3f b(x); // becomes [[0,0,0],[1,1,1],[2,2,2]]
    Vec3x3f c(x, x, x); // becomes [[0,1,2],[0,1,2],[0,1,2]]
    Vec3x3f d(Vec3f(0.0f,1.0f,2.0f)) //becomes [[0,0,0],[1,1,1],[2,2,2]]

    So the solution we have chosen is to disallow the construction of objects such as b. This still allows the free implicit cast to go from double to float.

    Use std::call_once to construct singeltons

    By using call_once from C++11, we can simplify the logic in code where we are querying same value variables from multiple threads.

    Use thread_local in vtkm::cont::GetGlobalRuntimeDeviceTracker function if possible

    It will reduce the cost of getting the thread runtime device tracker, and will have a better runtime overhead if user constructs a lot of short lived threads that use VTK-m.

    Replace std::random_shuffle with std::shuffle

    std::random_shuffle is deprecated in C++14 because it's using std::rand which uses a non uniform distribution and the underlying algorithm is unspecified. Using std::shuffle can provide a reliable result in a 64 bit version.

  • v1.2.0   VTK-m 1.2.0 1.2.0 is our third official release. This release is stable and should compile across many platforms. However, note that VTK-m is still under heavy development and each release contains backward-incompatible changes.
    9cb15223 · Release VTK-m 1.2.0 ·
  • v1.1.0   VTK-m 1.1.0 is our second official release. This release is stable and should compile across many platforms. However, note that VTK-m is still under heavy development and each release contains backward-incompatible changes.
  • v1.0.0   VTK-m 1.0.0

    VTK-m 1.0.0 is our first official release. This release is stable and should compile across many platforms. However, note that VTK-m is still under heavy development. We will be following up this release soon with a version 1.1, which may contain some backward-incompatible changes.