Handling a vtkDataSet in a vtkImplicitArray backend

In short

Using vtkImplicitArray allows for memory reduction, at cost of computing the "answer" each time a value is requested. In some use case, this computation is done by forwarding the request to a vtkDataSet, but then we have a vtkDataArray handling a vtkDataSet.

Example

For instance, in !11052, we try to modify CellValidator.

In pseudo-code, the backend looks like that:

struct ValidatorBackend
{
  short operator()(const vtkIdType index) const
  {
    this->DataSet->GetCell(index, cell);
    vtkCellValidator::State state = vtkCellValidator::Check(cell, this->Tolerance);
  }

  vtkDataSet* DataSet;
}

Problem

This leads to several questions about the lifetime of the vtkDataSet and the vtkImplicitArray itself.

From @spiros.tsalikis :

If you keep the dataset pointer raw, it's gonna be a dangling pointer if the dataset is deallocated.
If you use a weak pointer, then the array won't be able to return results if the dataset is deallocated.
If you use a smart pointer, but you have requested for the dataset to be deallocated, then it won't be deallocated, which i don't think it's a great idea. Also you would need to resolve cycling dependency between the two

Suggestion

Both 1. and 2. lead to an invalid array, so they are not the correct way to go. So we need to find a way to properly handle this cycling dependency.

We propose to add some logic in the backend:

using a weak pointer, so one can still destroy the dataset (break the cycle)
observing the DeleteEvent of the dataset
on dataset deallocation, allocates the full array in memory (keep array valid). So this happens only if array is handled by someone else.

This solution allows to:

keep the memory optimization.
have no impact for "classic" case where the data array is only owned by the dataset (through a dataset attributes).
keep the array usable and valid when dataset is destroyed

Impl details

As we have several occurrences of this problem, we will create a dedicated vtkImplicitArray backend that will handle this logic at one place. Then the relevant filters may create their dedicated backend to only add the operator() logic, and not duplicate this dataset management.

This was discussed in the following MR: