Redesign ArrayHandles to Access data in Void Buffers
The current implementation of ArrayHandle
is meant to be very generic. To define an ArrayHandle
, you actually create a Storage
class that maintains the data and provides portals to access it (on the host). Because the Storage
can provide any type of data structure it wants, you also need to define an ArrayTransfer
that describes how to move the ArrayHandle
to and from a device. It also has to be repeated for every translation unit that uses them.
This is a very powerful mechanism. However, one of the major problems with this approach is that every ArrayHandle
type needs to have a separate compile path for every value type crossed with every device. Because of this limitation, the ArrayHandle
for the basic storage has a special implementation that manages the actual data allocation and movement as void *
arrays. In this way all the data management can be compiled once and put into the vtkm_cont
library. This has dramatically improved the VTK-m compile time.
This design proposal is an attempt to replicate the basic ArrayHandle
's success to all other storage types. The basic idea is to make the implementation of ArrayHandle
storage slightly less generic. Instead of requiring it to manage the data it stores, it instead just builds ArrayPortal
s from void
pointers that it is given. The management of void
pointers can be done in non-templated classes that are compiled into a library.
The following are the planned designs of the new ArrayHandle
structure.
Buffer
Key to these changes is the introduction of a vtkm::cont::internal::Buffer
object. As the name implies, the Buffer
object manages a single block of bytes. Buffer
is agnostic to the type of data being stored. It only knows the length of the buffer in bytes. It is responsible for allocating space on the host and any devices as necessary and for transferring data among them. (Since Buffer
knows nothing about the type of data, a precondition of VTK-m would be that the host and all devices have to have the same endian.)
The idea of the Buffer
object is similar in nature to the existing vtkm::cont::internal::ExecutionArrayInterfaceBasicBase
except that it will manage a buffer of data among the control and all devices rather than in one device through a templated subclass.
As will be explained in more detail later, ArrayHandle
will hold some fixed number of Buffer
objects. (The number can be zero for implicit ArrayHandle
s.) Because all the interaction with the devices happen through Buffer
, it will no longer be necessary to compile any reference to ArrayHandle
for devices (e.g. you won’t have to use nvcc just because the code links ArrayHandle.h
).
The following sections describe the interface to Buffer
.
Constructors
The default constructor creates an empty buffer.
The copy constructors/operators share the data in the Buffer
like a shared pointer.
Buffer
will also have a constructor that takes a void *
array, the size of that array in bytes, a deleter object that knows how to delete the array, and a DeviceAdapterId
to specify on which device the data exists. The deleter object will have to be stored in polymorphic class (details left to the reader). Allowing to specify the DeviceAdapterId
will make it straightforward to pass in situ data created on a GPU. If the DeviceAdapterId
is not specified (and/or a special tag such as DeviceAdapterTagUndefined
), then the void *
is considered as pointing to host memory.
Allocate
The Buffer::SetNumberOfBytes
method takes the size of the memory to allocate in bytes and a flag that specifies whether any existing data in the buffer should be preserved.
As an implementation note, the allocation should be lazy. That is, when SetNumberOfBytes
is called, it should not immediately allocate data. Instead, it should mark a "dirty" flag and defer the actual allocation to when data is specified for a specific device (or host). Allocate
could actually instead delete the buffers if the preserve data flag is off.
Buffer::GetNumberOfBytes
will return the number of bytes given in the last SetNumberOfBytes
.
Retrieving pointers
The void *
buffer can be retrieved via a ReadPointer
and WritePointer
methods. Each method takes 2 arguments: the device for the buffer and a Token
object. The method will make the appropriate allocations and data copies as necessary.
Storage
The vtkm::cont::internal::Storage
class will change dramatically. Although an instance will be kept, the intention is for Storage
itself to be a stateless object. It will manage its data through Buffer
objects provided from the ArrayHandle
.
That said, it is possible for Storage
to have some state. For example, an ArrayHandlePermutation
will probably need a reference to the underlying value array to keep track of how many values it has.
Number of buffers
The Storage
contains its data in a fixed number of Buffer
objects. It communicates the number of Buffer
s it needs by declaring a constexpr
value named NUMBER_OF_BUFFERS
.
static constexpr vtkm::IdComponent NUMBER_OF_BUFFERS = 1;
Buffer sizes
Storage
contains a method named GetBufferSizes
to determine the size of the buffers (in bytes) for a given size of the array (in number of values). The method takes in the number of values desired and a vtkm::Vec<vtkm::Id, NUMBER_OF_BUFFERS>
reference to communicate the sizes. This short array will be pre-filled with the existing size of the buffers. This way a Storage
that cannot be resized can check the existing buffer sizes before creating an error.
void GetBufferSizes(vtkm::Id numValues, vtkm::Vec<vtkm::Id, 1>& bufferSizes)
{
bufferSizes[0] = static_cast<vtkm::Id>(sizeof(T)) * numValues;
}
Along the same lines, Storage
needs a method named CheckBufferSizes
that verifies that all the buffer sizes are correct. It returns a bool
whether it is correct or not.
bool CheckBufferSizes(vtkm::Id numValues, const vtkm::Vec<vtkm::Id, 1>& bufferSizes)
{
return (bufferSizes[0] != static_cast<vtkm::Id>(sizeof(T)) * numValues);
}
Portal creation
As with before, Storage
must define the portal types used to access the data. For more readability, these will be named ReadPortalType
and WritePortalType
.
using ReadPortalType = vtkm::cont::internal::ArrayPortalFromIterators<const T*>;
using WritePortalType = vtkm::cont::internal::ArrayPortalFromIterators<T*>;
There will also be a pair of methods named CreateReadPortal
and CreateWritePortal
to create read-only and read-write portals. They take a number of values, a vtkm::Vec
of void *
of buffer pointers, and a vtkm::Vec
that indicates each buffer size.
ReadPortalType CreateReadPortal(vtkm::Id numValues,
const vtkm::Vec<const void *, 1>& buffers,
const vtkm::Vec<vtkm::Id, 1>& bufferSizes)
{
VTKM_ASSERT(this->CheckBufferSizes(numValues, bufferSizes);
const T* iteratorBegin = reinterpret_cast<const T*>(buffers[0]);
return ReadPortalType(iteratorBegin, iteratorBegin + numValues);
}
WritePortalType CreateWritePortal(vtkm::Id numValues,
const vtkm::Vec<void *, 1>& buffers,
const vtkm::Vec<vtkm::Id, 1>& bufferSizes)
{
VTKM_ASSERT(this->CheckBufferSizes(numValues, bufferSizes);
T* iteratorBegin = reinterpret_cast<T*>(buffers[0]);
return WritePortalType(iteratorBegin, iteratorBegin + numValues);
}
Note that the buffers passed to these methods may actually be located in the execution environment and inaccessible from the current thread.
ArrayTransport
The vtkm::cont::internal::ArrayTransfer
class will be removed completely. All data transfers will be handled internally with the Buffer
object
Portals
A big change for this design is that the type of a portal for an ArrayHandle
will be the same for all devices and the host. Thus, we no longer need specialized versions of portals for each device. We only have one portal type. And since they are constructed from void *
pointers, one method can create them all.
ArrayHandle
In this ArrayHandle
redesign, much of the responsibility that was previously placed in the Storage
and ArrayTransport
classes are moved into the main ArrayHandle
. However, since this management centers around void *
pointers, it can mostly happen in the vtkm_cont library rather than having to recompile it for every type, storage, and translation unit.
The ArrayHandle
maintains a vtkm::Vec<vtkm::cont::internal::Buffer, N>
(where N
is the NUMBER_OF_BUFFERS
from the Storage
class) for the actual data and memory management. The ArrayHandle
also maintains the number of values. The ArrayHandle
works with the Storage
class to manage the buffers.
Allocate
As managed through the Buffer
objects, the allocation in ArrayHandle
s will be lazy. Thus, Allocate
will not actually allocate memory (at that time). For this reason, we will deprecate the Allocate
method and replace it with SetNumberOfValues
.
In the ArrayHandle
’s SetNumberOfValues
method, the ArrayHandle
calls the Storage::GetBufferSizes
method and uses that result to reallocate the Buffer
s.
template <typename T, typename S>
VTKM_CONT void ArrayHandle<T, S>::Allocate(
vtkm::Id numValues, vtkm::CopyFlag preserve = vtkm::CopyFlag::Off)
{
if (this->NumberOfValues = numValues)
{
return;
}
vtkm::Vec<vtkm::Id, NUMBER_OF_BUFFERS> sizes;
this->Storage.GetBufferSizes(numValues, sizes);
// Note: the real implementation should be more careful about exceptions.
// If an exception is raised, the buffers may need to be "repaired" to their
// original size.
for (vtkm::Id index = 0; index < NUMBER_OF_BUFFERS; ++index)
{
this->Buffers[index].Allocate(sizes[index], preserve);
}
this->NumberOfValues = numValues;
}
Note that SetNumberOfValues
takes a new preserve
flag that indicates whether any existing data should be kept. If preserve
is set to vtkm::CopyFlag::On
, then any data that currently exists in the array and fits within the new size will still exist after any reallocation. This might mean moving the data around. If preserve
is set to vtkm::CopyFlag::Off
(the default), then the memory in the new allocation should be considered uninitialized (the previous behavior of Allocate
).
Because of this feature Shrink
will also be deprecated. Instead, SetNumberOfValues
with the preserve
flag set to vtkm::CopyFlag::On
should be called.
Advantages
The ArrayHandle
interface should not change significantly for external uses, but this redesign offers several advantages.
Faster Compiles
Because the memory management is contained in a non-templated Buffer
class, it can be compiled once in a library and used by all template instances of ArrayHandle
. It should have similar compile advantages to our current specialization of the basic ArrayHandle
, but applied to all types of ArrayHandle
s.
Fewer Templates
Hand-in-hand with faster compiles, the new design should require fewer templates and template instances. We have immediately gotten rid of ArrayTransport
. Storage
is also much shorter. Because all ArrayPortal
s are the same for every device and the host, we need many fewer versions of those classes. In the device adapter, we can probably collapse the three ArrayManagerExecution
classes into a single, much simpler class that does simple memory allocation and copy.
Fewer files need to be compiled for CUDA
Including ArrayHandle.h
no longer adds code that compiles for a device. Thus, we should no longer need to compile for a specific device adapter just because we access an ArrayHandle
. This should make it much easier to achieve our goal of Matt’s "firewall". That is, code that just calls VTK-m filters does not need to support all its compilers and flags.
Simpler ArrayHandle specialization
The newer code should simplify the implementation of special ArrayHandle
s a bit. You need only implement an ArrayPortal
that operates on one or more void *
arrays and a simple Storage
class.
Out of band memory sharing
With the current version of ArrayHandle
, if you want to take data from one ArrayHandle
you pretty much have to create a special template to wrap another ArrayHandle
around that. With this new design, it is possible to take data from one ArrayHandle
and give it to another ArrayHandle
of a completely different type. You can’t do this willy-nilly since different ArrayHandle
types will interpret buffers differently. But there can be some special important use cases.
One such case could be an ArrayHandle
that provides strided access to a buffer. (Let’s call it ArrayHandleStride
.) The idea is that it interprets the buffer as an array for a particular type (like a basic ArrayHandle
) but also defines a stride, skip, and repeat so that given an index it looks up the value ((index / skip) % repeat) * stride
. The point is that it can take an AoS array of tuples and represent an array of one of the components.
The point would be that if you had a VariantArrayHandle
or Field
, you could pull out an array of one of the components as an ArrayHandleStride
. An ArrayHandleStride<vtkm::Float32>
could be used to represent that data that comes from any basic ArrayHandle
with vtkm::Float32
or a vtkm::Vec
of that type. It could also represent data from an ArrayHandleCartesianProduct
and ArrayHandleSoA
. We could even represent an ArrayHandleUniformPointCoordinates
by just making a small array. This allows us to statically access a whole bunch of potential array storage classes with a single type.
Potentially faster device transfers
There is currently a fast-path for basic ArrayHandle
s that does a block cuda memcpy between host and device. But for other ArrayHandle
s that do not defer their ArrayTransfer
to a sub-array, the transfer first has to copy the data into a known buffer.
Because this new design stores all data in Buffer
objects, any of these can be easily and efficiently copied between devices.
Disadvantages
This new design gives up some features of the original ArrayHandle
design.
Can only interface data that can be represented in a fixed number of buffers
Because the original ArrayHandle
design required the Storage
to completely manage the data, it could represent it in any way possible. In this redesign, the data need to be stored in some fixed number of memory buffers.
This is a pretty open requirement. I suspect most data formats will be storable in this. The user’s guide has an example of data stored in a std::deque
that will not be representable. But that is probably not a particularly practical example.
VTK-m would only be able to support hosts and devices with the same endian
Because data are transferred as void *
blocks of memory, there is no way to correct words if the endian on the two devices does not agree. As far as I know, there should be no issues with the proposed ECP machines.
If endian becomes an issue, it might be possible to specify a word length in the Buffer
. That would assume that all numbers stored in the Buffer
have the same word length.
ArrayPortals must be completely recompiled in each translation unit
We can declare that an ArrayHandle
does not need to include the device adapter header files in part because it no longer needs specialized ArrayPortal
s for each device. However, that means that a translation unit compiled with the host compiler (say gcc) will produce different code for the ArrayPortal
s than those with the device compiler (say nvcc). This could lead to numerous linking problems.
To get around these issues, we will probably have to enforce no exporting of any of the ArrayPotal
symbols and force them all to be recompiled for each translation unit. This will serve to increase the compile times a bit. We will probably also still encounter linking errors as there would be no way to enforce this requirement.
Cannot have specialized portals for the control environment
Because the new design unifies ArrayPortal
types across control and execution environments, it is no longer possible to have a special version for the control environment to manage resources. This will require removing some recent behavior of control portals such as with MR !1988 (merged).
Currently, the differences only do checks, so we probably could live without.
SAND 2020-3455 O