BitField Support
Since VTKm targets platforms with highly constrained memory pools (e.g. CUDA), being able to represent a dense set of indexed booleans as a bitfield is valuable.
This issue is an early draft of what an implementation of BitField
s would look like in VTK-m, and what control and execution machinery would be needed to implement these.
Control Implementation and API
In the control environment, the BitField
implementation would simply be an ArrayHandle
that points to a memory buffer. The main question for the implementation regards the ValueType of this array handle:
- Standardize WordType on UInt32 or UInt64?
- Pro: Simple to implement -- Algorithms can simply use
ArrayHandle<vtkm::WordType>
to represent a bit field. - Con: May be more constrained than optimal (some platforms may be most efficient with 32-bit words, 64-bit words, or even larger words through the use of SIMD instructions).
- Pro: Simple to implement -- Algorithms can simply use
- Add an explicit BitField type that takes any ArrayHandle with Basic storage, and
WordType
is specified on access.- Pro: Best performance available on each platform.
- Restricting the underlying array's Storage to StorageBasic allows us to address contiguous words of arbitrary size.
- Also requires ArrayHandleBasic to align and pad all allocations to a suitable boundary (256 bytes?) so that we don't write to unowned memory.
- Con: More complex to implement.
- Con: Code bloat -- each WordSize will require instantiating a set of templates.
- Pro: Best performance available on each platform.
The second option seems preferable to me. Something like this could work:
template <typename ArrayHandleType>
class BitField
{
using StorageWordType = typename ArrayHandleType::ValueType;
public:
template <typename WordType, typename Device>
BitPortal<ArrayHandlePortal, WordType> PrepareForInput(Device); // etc
};
template <typename ArrayHandlePortal, typename WordType>
class BitPortal
{
public:
// Reads/writes an entire word at a time. Useful for bulk processing algos
// (e.g. CountSetBits).
WordType GetWord(vtkm::Id idx) const;
void SetWord(vtkm::Id idx, WordType word);
// Reads/writes a single bit. Atomically update the storage word.
bool GetBit(vtkm::Id idx) const;
void SetBit(vtkm::Id idx, bool val);
// All of these atomically modify a word at a time, returning the
// old value of the word.
WordType AtomicBitwiseAnd(vtkm::Id wordIdx, WordType);
WordType AtomicBitwiseOr(vtkm::Id wordIdx, WordType);
WordType AtomicBitwiseXor(vtkm::Id wordIdx, WordType);
WordType AtomicCAS(vtkm::Id wordIdx, WordType);
WordType AtomicExchange(vtkm::Id wordIdx, WordType);
};
It would be useful to provide preferred word types that can be easily looked up by algorithms per-device/platform.
Execution Implementation and API
BitFields could be exposed to the Worklet framework by providing specialized execution objects:
-
Bits[In|Out|InOut]
: Maps tobools
in the execution signature. -
BitField[In|Out|InOut]:
Maps to objects similar to the controlBitPortal
for more complex operations.
Algorithms
These would be added to DeviceAdapterAlgorithm
:
-
BitFieldToUnorderedSet
: Return a unique, unordered list of indices indicating which bits are set in the BitField. -
BitFieldPopulationCount
: Counts and returns the number of set bits. -
BitField[And|Or|Xor|Not]
: Performs the indicated operation on two bit sets (output may be one of the inputs).
Helpers
Math.h
would provide the follow helpers templated on WordType. The default implementation will recursively split the word in half until a specialized implementation is found:
vtkm::Int32 PopulationCount(WordType)
vtkm::Int32 FindFirstSetBit(WordType)
vtkm::Int32 CountLeadingZeroBits(WordType)