Allow masking of worklet invocations

There have recently been use cases where it would be helpful to mask out some of the invocations of a worklet. The idea is that when invoking a worklet with a mask array on the input domain, you might implement your worklet more-or-less like the following.

VTKM_EXEC void operator()(bool mask, /* other parameters */)
{
  if (mask)
  {
    // Do interesting stuff
  }
}

This works, but what if your mask has mostly false values? In that case, you are spending tons of time loading data to and from memory where fields are stored for no reason.

You could potentially get around this problem by adding a scatter to the worklet. However, that will compress the output arrays to only values that are active in the mask. That is problematic if you want the masked output in the appropriate place in the original arrays. You will have to do some complex (and annoying and possibly expensive) permutations of the output arrays.

Thus, we would like a new feature similar to scatter that instead masks out invocations so that the worklet is simply not run on those outputs.

Proposed Interface

We propose to have a "Mask" feature that is similar (and orthogonal) to the existing "Scatter" feature. Worklet objects will define a MaskType that provides on object that manages the selections of which invocations are skipped. The following Mask objects will be defined.

MaskNone - This removes any mask of the output. All outputs are generated. This is the default if no MaskType is explicitly defined.
MaskSelect - Requires the user to provide arrays that define which values are masked out. This can be done either with an array of 0's and 1's or with an array of all indices to pass (on can be derived from the other).

It will be straightforward to implement other versions of masks. (For example, you could make a mask class that selectes every Nth entry.) Those could be made on an as-needed basis.

Implementation

The implementation will follow the same basic idea of how scatters are implemented.

Mask Classes

The mask class will be required to implement the following items.

ThreadToOutputType - A type for an array that maps a thread index (an index in the array) to an output index. A reasonable type for this could be vtkm::cont::ArrayHandle<vtkm::Id>.
GetThreadToOutputMap - Given the range for the output (e.g. the number of items in the output domain), returns an array of type ThreadToOutputType that is the actual map.
GetThreadRange - Given a range for the output (e.g. the number of items in the output domain), returns the range for the threads (e.g. the number of times the worklet will be invoked).

Here is an example of how the MaskNone class will be implemented.

struct MaskNone
{
  using ThreadToOutputType = vtkm::cont::ArrayHandleIndex;
  
  VTKM_CONT ThreadToOutputType GetThreadToOutputMap(vtkm::Id outputRange) const
  {
    return ThreadToOutputType(outputRange);
  }
  VTKM_CONT ThreadToOutputType GetThreadToOutputMap(vtkm::Id3 outputRange) const
  {
    return ThreadToOutputType(outputRange[0] * outputRange[1] * outputRange[2]);
  }
  
  template <typename RangeType>
  VTKM_CONT RangeType GetThreadRange(RangeType outputRange) const
  {
    return outputRange;
  }
}

Dispatching

The vtkm::worklet::internal::DispatcherBase will manage a mask class in the same way it manages the scatter class. It will get the MaskType from the worklet it is templated on. It will require a MaskType object during its construction. (That also means that all existing dispatchers will have to be modified to also accept mask objects in their constructors.)

In the internal DispatcherBaseTryExecuteFunctor class, the range will be modified by the mask's GetThreadRange after the range is modified by the scatter. The implementation will change the statement

    self->InvokeTransportParameters(
      invocation, dimensions, self->Scatter.GetOutputRange(dimensions), device);

    self->InvokeTransportParameters(
      invocation, dimensions, self->Mask.GetThreadRange(self->Scatter.GetOutputRange(dimensions)), device);

The vtkm::Invocation class will be changed to hold the ThreadToOutputMap array from the mask. It will likewise have a templated ChangeThreadToOutputMap method added (similar to those already existing for the arrays from a scatter). This method will be used in DispatcherBase::InvokeTransportParameters to add the mask's array to the invocation before calling InvokeSchedule.

Thread Indices

With the addition of masks, the ThreadIndices classes need to be changed to manage the actual output index. Previously, the output index was always the same as the thread index. However, now these two can be different. The GetThreadIndices methods of the worklet base classes will have an argument added that is the portal to the ThreadToOutputMap.

The worklet GetThreadIndices is called from the Task classes. These classes will have to be changed to pass in this additional argument. Since the Task classes get an Invocation object from the dispatcher, which will contain the ThreadToOutputMap, this change is trivial.

Interaction Between Mask and Scatter

Although it seems weird, it should work fine to mix scatters and masks. The scatter will first be applied to the input to generate a (potential) list of output elements. The mask will then be applied to these output elements.

Other Names

Is Mask the right name for this feature? Should it instead be called Stencil. Both of these concepts are similar. The ideas come from painting. The difference is that masks cover up a section of canvas to prevent paint from touching that part whereas stencils cover everything except a region to fill. In terms of how we select the items, the two are pretty synonymous. Mask might be better simply because the word looks more different from scatter and therefore is less likely to be confused.

Another possible name is Permute. However, the word permute is already used a lot in VTK-m, and the intention of the feature is really to remove items in the output, not shift around indices.

Edited Jan 14, 2019 by Kenneth Moreland