Design for filter support for multiple small blocks
In situations when a filter is run on vtkm::cont::PartitionedDataSet
, it can be advantageous to run multiple blocks in parallel. For example, with many small blocks, it could be more efficient to run multiple blocks in parallel on the GPU (or multi-core CPU). In cases where multiple GPUs are available, it might be advantageous to run one block on each GPU. Combinations of these two situations are also interesting use cases.
This functionality is part of the discussion in the filter redesign issue (#601 (closed)).
Exploration for this has started in !2437 (closed). It consists of the following steps:
First, a thread safe container, vtkm::filter::DataSetQueue
was created to hold a set of vtkm::cont::DataSet
.
Inside the internals of a filter launch, if a filter is to be run by numThreads
, the following is run:
vtkm::filter::DataSetQueue inputQueue(input), outputQueue;
for (int i = 0; i < numThreads; i++)
{
std::thread t(RunFilter,i, self, policy, std::ref(inputQueue), std::ref(outputQueue));
threads.push_back(std::move(t));
}
for (auto& t : threads)
t.join();
output = outputQueue.Get();
RunFilter
, which is templated on the filter (Derived) does the following:
//Create a deep copy of the filter so that no memory is shared.
Derived* clone = static_cast<Derived*>(self->Clone());
vtkm::cont::DataSet ds;
while (inputQueue.GetTask(ds))
{
auto outDS = RunTheFilter(clone, ds, policy);
CallMapFieldOntoOutput(clone, ds, outDS, policy);
outputQueue.Push(std::move(ds));
}
delete clone;
To make this all work, Filter::Clone
needs to create a complete, deep copy of the filter. If not, the threads will stomp on the memory. This is particularly a problem with ArrayHandles where pointers are just copied. The Clone method is not ideal, and is subject to error if a filter is updated, but the Clone method is not. Again, a cleaner solution to this will be part of the filter redesign mentioned above.
Clone is a virtual method that must be implemented by derived classes. As an example, the Clone for Contour is shown below:
Contour* Clone() const override
{
//create a new filter
Contour* clone = new Contour;
//Copy the filter state (attributes) from this into the clone.
clone->CopyStateFrom(this);
return clone;
}
The CopyStateFrom for Contour is shown below:
{
//Copy the base class data into the clone.
this->FilterDataSetWithField<Contour>::CopyStateFrom(contour);
 //Copy the state from the contour filter.
this->IsoValues = contour->IsoValues;
this->GenerateNormals = contour->GenerateNormals;
this->AddInterpolationEdgeIds = contour->AddInterpolationEdgeIds;
this->ComputeFastNormalsForStructured = contour->ComputeFastNormalsForStructured;
this->ComputeFastNormalsForUnstructured = contour->ComputeFastNormalsForUnstructured;
this->NormalArrayName = contour->NormalArrayName;
this->InterpolationEdgeIdsArrayName = contour->InterpolationEdgeIdsArrayName;
}
Again, CopyStateFrom
is not an ideal solution. If new member data are added to a filter and they are not copied in this method, then threaded filter functionality will be wrong. Further, if shared memory is set in this method (e.g., ArrayHandle), then memory can be overwritten by multiple threads.
Additionally, not all filters can be threaded in this way(e.g., particle advection filters). The Filter class also provides a
virtual bool CanThread() const
{
return false;
}
method that derived classes can override.