Commit 81ac8771 authored by Robert Maynard's avatar Robert Maynard

Merge branch 'release_notes_1.3.0'

parents b6b123e9 57127195
# 0-sample-topic
This is a sample release note for the change in a topic.
Developers should add similar notes for each topic branch
making a noteworthy change. Each document should be named
and titled to match the topic name to avoid merge conflicts.
This diff is collapsed.
# Add float version operations for vtkm::Math Pi()
Now PI related functions are evalulated at compile time as constexpr functions.
It also removes the old static_cast<T>vtkm::Pi() usages with
template ones and fix several conversion warnings.
# Add a release resources API to CellSet and its derived classes
We now offer the ability to unload execution memory from CellSet and its derived
classes(CellSetExplicit, CellSetPermutation and CellSetStructured) using the ReleaseResourcesExecution.
# TryExecuteOnDevice allows for runtime selection of which device to execute on
VTK-m now offers `vtkm::cont::TryExecuteOnDevice` to allow for the user to select
which device to execute a function on at runtime. The original `vtkm::cont::TryExecute`
used the first valid device, which meant users had to modify the runtime state
through the `RuntimeTracker` which was verbose and unwieldy.
Here is an example of how you can execute a function on the device that an array handle was last executed
struct ArrayCopyFunctor
template <typename Device, typename InArray, typename OutArray>
VTKM_CONT bool operator()(Device, const InArray& src, OutArray& dest)
vtkm::cont::DeviceAdapterAlgorithm<Device>::Copy(src, dest);
return true;
template<typename T, typename InStorage, typename OutStorage>
void SmartCopy(const vtkm::cont::ArrayHandle<T, InStorage>& src, vtkm::cont::ArrayHandle<T, OutStorage>& dest)
bool success = vtkm::cont::TryExecuteOnDevice(devId, ArrayCopyFunctor(), src, dest);
if (!success)
vtkm::cont::TryExecute(ArrayCopyFunctor(), src, dest);
# vtkm::cont::Algorithm now can be told which device to use at runtime
The `vtkm::cont::Algorithm` has been extended to support the user specifying
which device to use at runtime previously Algorithm would only use the first
enabled device, requiring users to modify the `vtkm::cont::GlobalRuntimeDeviceTracker`
if they wanted a specific device used.
To select a specific device with vtkm::cont::Algorithm pass the `vtkm::cont::DeviceAdapterId`
as the first parameter.
vtkm::cont::ArrayHandle<double> values;
//call with no tag, will run on first enabled device
auto result = vtkm::cont::Algorithm::Reduce(values, 0.0);
//call with an explicit device tag, will only run on serial
vtkm::cont::DeviceAdapterTagSerial serial;
result = vtkm::cont::Algorithm::Reduce(serial, values, 0.0);
//call with an runtime device tag, will only run on serial
vtkm::cont::DeviceAdapterId device = serial;
result = vtkm::cont::Algorithm::Reduce(device, values, 0.0);
# Add a common API for CoordinateSystem to unload execution resources
We now offer the ability to unload execution memory from ArrayHandleVirtualCoordinate
and CoordinateSystem using the ReleaseResourcesExecution method.
Field now has a ReleaseResourcesExecution.
This commit also fixes a bug that ArrayTransfer of ArrayHandleVirtualCoordinate
does not release execution resources properly.
#Allow histogram filter to take custom types
By passing TypeList and StorageList type into FieldRangeGlobalCompute,
upstream users(VTK) can pass custom types into the histogram filter.
Allow disabling/enabling of CUDA managed memory through a environment variable
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is capable
of doing so.
# VTK-m ArrayHandle can now take ownership of a user allocated memory location
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.
Here is an example of how this is done:
T* buffer = new T[100];
auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
storage(buffer, 100, user_free_function);
vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
# Allow ArrayHandleTransform to work with ExecObject
Previously, the `ArrayHandleTransform` class only worked with plain old
data (POD) objects as is functors. For simple transforms, this makes sense
since all the data comes from a target `ArrayHandle` that will be sent to
the device through a different path. However, this also requires the
transform to be known at compile time.
However, there are cases where the functor cannot be a POD object and has
to be built for a specific device. There are numerous reasons for this. One
might be that you need some lookup tables. Another might be you want to
support a virtual object, which has to be initialized for a particular
device. The standard way to implement this in VTK-m is to create an
"executive object." This actually means that we create a wrapper around
executive objects that inherits from
`vtkm::cont::ExecutionAndControlObjectBase` that contains a
`PrepareForExecution` method and a `PrepareForControl` method.
As an example, consider the use case of a special `ArrayHandle` that takes
the value in one array and returns the index of that value in another
sorted array. We can do that by creating a functor that finds a value in an
array and returns the index.
``` cpp
template <typename ArrayPortalType>
struct FindValueFunctor
ArrayPortalType SortedArrayPortal;
FindValueFunctor() = default;
VTKM_CONT FindValueFunctor(const ArrayPortalType& sortedPortal)
: SortedArrayPortal(sortedPortal)
{ }
VTKM_EXEC vtkm::Id operator()(const typename PortalType::ValueType& value)
vtkm::Id leftIndex = 0;
vtkm::Id rightIndex = this->SortedArrayPortal.GetNubmerOfValues();
while (leftIndex < rightIndex)
vtkm::Id middleIndex = (leftIndex + rightIndex) / 2;
auto middleValue = this->SortedArrayPortal.Get(middleIndex);
if (middleValue <= value)
rightIndex = middleValue;
leftIndex = middleValue + 1;
return leftIndex;
Simple enough, except that the type of `ArrayPortalType` depends on what
device the functor runs on (not to mention its memory might need to be
moved to different hardware). We can now solve this problem by creating a
functor objecgt set this up for a device. `ArrayHandle`s also need to be
able to provide portals that run in the control environment, and for that
we need a special version of the functor for the control environment.
``` cpp
template <typename ArrayHandleType>
struct FindValueExecutionObject : vtkm::cont::ExecutionAndControlObjectBase
ArrayHandleType SortedArray;
FindValueExecutionObject() = default;
VTKM_CONT FindValueExecutionObject(const ArrayHandleType& sortedArray)
: SortedArray(sortedArray)
{ }
template <typename Device>
PrepareForExecution(Device device)
using FunctorType =
return FunctorType(this->SortedArray.PrepareForInput(device));
FundValueFunctor<typename ArrayHandleType::PortalConstControl>
using FunctorType =
FindValueFunctor<typename ArrayHandleType::PortalConstControl>
return FunctorType(this->SortedArray.GetPortalConstControl());
Now you can use this execution object in an `ArrayHandleTransform`. It will
automatically be detected as an execution object and be converted to a
functor in the execution environment.
``` cpp
auto transformArray =
inputArray, FindValueExecutionObject<decltype(sortedArray)>(sortedArray));
# Add `ArrayHandleView` fancy array
Added a new class named `ArrayHandleView` that allows you to get a subset
of an array. You use the `ArrayHandleView` by giving it a target array, a
starting index, and a length. Here is a simple example of usage:
``` cpp
vtkm::cont::ArrayHandle<vtkm::Id> sourceArray;
vtkm::cont::ArrayCopy(vtkm::cont::ArrayHandleIndex(10), sourceArray);
// sourceArray has [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
viewArray(sourceArray, 3, 5);
// viewArray has [3, 4, 5, 6, 7]
There is also a convenience `make_ArraHandleView` function to create view
arrays. The following makes the same view array as before.
``` cpp
auto viewArray = vtkm::cont::make_ArrayHandleView(sourceArray, 3, 5);
# `ArrayHandleCompositeVector` simplified and made writable.
`ArrayHandleCompositeVector` is now easier to use, as its type has a more
straightforward definition: `ArrayHandleCompositeVector<Array1, Array2, ...>`.
Previously, a helper metaprogramming struct was needed to determine the type
of the array handle.
In addition, the new implementation supports both reading and writing, whereas
the original version was read-only.
Another notable change is that the `ArrayHandleCompositeVector` no longer
supports component extraction from the source arrays. While the previous version
could take a source array with a `vtkm::Vec` `ValueType` and use only a single
component in the output, the new version requires that all input arrays have
the same `ValueType`, which becomes the `ComponentType` of the output
When component extraction is needed, the classes `ArrayHandleSwizzle` and
`ArrayHandleExtractComponent` have been introduced to allow the previous
usecases to continue working efficiently.
# `ArrayHandleExtractComponent` target component is now set at runtime.
Rather than embedding the extracted component in a template parameter, the
extract operation is now defined at runtime.
This is easier to use and keeps compile times / sizes / memory requirements
# `ArrayHandleSwizzle` component maps are now set at runtime.
Rather than embedding the component map in the template parameters, the swizzle
operation is now defined at runtime using a `vtkm::Vec<vtkm::IdComponent, N>`
that maps the input components to the output components.
This is easier to use and keeps compile times / sizes / memory requirements
# Build System Redesign and new minimum CMake
VTK-m CMake buildsystem was redesigned to be more declarative for consumers.
This was done by moving away from the previous component design and instead
to explicit targets. Additionally VTK-m now uses the native CUDA support
introduced in CMake 3.8 and has the following minimum CMake versions:
- Visual Studio Generator requires CMake 3.11+
- CUDA support requires CMake 3.9+
- Otherwise CMake 3.3+ is supported
When VTK-m is found find_package it defines the following targets:
- `vtkm_cont`
- contains all common core functionality
- always exists
- `vtkm_rendering`
- contains all the rendering code
- exists only when rendering is enabled
- rendering also provides a `vtkm_find_gl` function
- allows you to find the GL (EGL,MESA,Hardware), GLUT, and GLEW
versions that VTK-m was built with.
VTK-m also provides targets that represent what device adapters it
was built to support. The pattern for these targets are `vtkm::<device>`.
Currently we don't provide a target for the serial device.
- `vtkm::tbb`
- Target that contains tbb related link information
implicitly linked to by `vtkm_cont` if tbb was enabled
- `vtkm::cuda`
- Target that contains cuda related link information
implicitly linked to by `vtkm_cont` if cuda was enabled
VTK-m can be built with specific CPU architecture vectorization/optimization flags.
Consumers of the project can find these flags by looking at the `vtkm_vectorization_flags`
So a project that wants to build an executable that uses vtk-m would look like:
cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
project(HellowWorld CXX)
#Find the VTK-m package.
#Will automatically enable the CUDA language if needed ( and bump CMake minimum )
find_package(VTKm REQUIRED)
add_executable(HelloWorld HelloWorld.cxx)
target_link_libraries(HelloWorld PRIVATE vtkm_cont)
if(TARGET vtkm::cuda)
set_source_files_properties(HelloWorld.cxx PROPERTIES LANGUAGE CUDA)
# Cell measure functions, worklet, and filter
VTK-m now provides free functions, a worklet, and a filter for computing
the integral measure of a cell (i.e., its arc length, area, or volume).
The free functions are located in `vtkm/exec/CellMeasure.h` and share the
same signature:
template<typename OutType, typename PointVecType>
OutType CellMeasure(
const vtkm::IdComponent& numPts,
const PointCoordVecType& pts,
const vtkm::exec::FunctorBase& worklet);
The number of points argument is provided for cell-types such as lines,
which allow an arbitrary number of points per cell.
See the worklet for examples of their use.
The worklet is named `vtkm::worklet::CellMeasure` and takes a template
parameter that is a tag list of measures to include.
Cells that are not selected by the tag list return a measure of 0.
Some convenient tag lists are predefined for you:
+ `vtkm::ArcLength` will only compute the measure of cells with a 1-dimensional parameter-space.
+ `vtkm::Area` will only compute the measure of cells with a 2-dimensional parameter-space.
+ `vtkm::Volume` will only compute the measure of cells with a 3-dimensional parameter-space.
+ `vtkm::AllMeasures` will compute all of the above.
The filter version, named `vtkm::filter::CellMeasures` – plural since
it produces a cell-centered array of measures — takes the same template
parameter and tag lists as the worklet.
By default, the output array of measure values is named "measure" but
the filter accepts other names via the `SetCellMeasureName()` method.
The only cell type that is not supported is the polygon;
you must triangulate polygons before running this filter.
See the unit tests for examples of how to use the worklet and filter.
The cell measures are all signed: negative measures indicate that the cell is inverted.
Simplicial cells (points, lines, triangles, tetrahedra) cannot not be inverted
by definition and thus always return values above or equal to 0.0.
Negative values indicate either the order in which vertices appear in its connectivity
array is improper or the relative locations of the vertices in world coordinates
result in a cell with a negative Jacobian somewhere in its interior.
Finally, note that cell measures may return invalid (NaN) or infinite (Inf, -Inf)
values if the cell is poorly defined, e.g., has coincident vertices
or a parametric dimension larger than the space spanned by its world-coordinate
The verdict mesh quality library was used as the source of the methods
for approximating the cell measures.
# CellSetExplicit now caches CellToPoint table when used with Invoke.
Issue #268 highlighted an issue where the expensive CellToPoint table
update was not properly cached when a CellSetExplicit was used with a
filter. This has been corrected by ensuring that the metadata
associated with the table survives shallow copying of the CellSet.
New methods are also added to check whether the CellToPoint table
exists, and also to reset it if needed (e.g. for benchmarking):
vtkm::cont::CellSetExplicit<> cellSet = ...;
// Check if the CellToPoint table has already been computed:
if (cellSet.HasConnectivity(vtkm::TopologyElementTagCell{},
// Reset it:
# Make RuntimeDeviceInformation class template independent
By making RuntimeDeviceInformation class template independent, vtkm is able to detect
device info at runtime with a runtime specified deviceId. In the past it's impossible
because the CRTP pattern does not allow function overloading(compiler would complain
that DeviceAdapterRuntimeDetector does not have Exists() function defined).
# User defined execution objects now usable with runtime selection of device adapter
- Changed how Execution objects are created and passed from the cont environment to the execution environment. See chapter 13.9 on worklets in the user manual for details.
- Instead we will now fill out a class and call prepareForExecution() and create the execution object for the execution environment from this function. This way we do not have to template the class that extends `vtkm::cont::ExecutionObjectBase` on the device.
Example of new execution object:
template <typename Device>
struct ExecutionObject
vtkm::Int32 Number;
struct TestExecutionObject : public vtkm::cont::ExecutionObjectBase
vtkm::Int32 Number;
template <typename Device>
VTKM_CONT ExecutionObject<Device> PrepareForExecution(Device) const
ExecutionObject<Device> object;
object.Number = this->Number;
return object;
# Use the strong typed enums for vtkm::cont::Field
By doing so, the compiler would not convert these enums into `int`s
which can cause some unexpected behavior.
# Add new option to VTKm_CUDA_Architecture
A new VTKm_CUDA_Architecture option called 'none' has been added. This will
disable all VTK-m generated cuda architecture flags, allowing the user to
specify their own custom flags.
Useful when VTK-m is used as a library in another project and the project wants
to use its own architecture flags.
# Worklets are now asynchronous in Cuda
Worklets are now fully asynchronous in the cuda backend. This means that
worklet errors are reported asynchronously. Existing errors are checked for
before invocation of a new worklet and at explicit synchronization points like
An important effect of this change is that functions that are synchronization
points, like `ArrayHandle::GetPortalControl()` and
`ArrayHandle::GetPortalConstControl()`, may now throw exception for errors from
previously executed worklets.
Worklet invocations, synchronization and error reporting happen independtly
on different threads. Therefore, synchronization on one thread does not affect
any other threads.
# Add support for deferred freeing of cuda memory
A new function, `void CudaAllocator::FreeDeferred(void* ptr, std::size_t numBytes)` has
been added that can be used to defer the freeing of cuda memory to a later point.
This is useful because `cudaFree` causes a global sync across all cuda streams. This function
internally maintains a pool of to-be-freed pointers that are freed together when a
size threshold is reached. This way a number of global syncs are collected together at
one point.
# VTK-m Worklets now execute on Cuda using grid stride loops
Previously VTK-m Worklets used what is referred to as a monolithic kernel
pattern for worklet execution. This assumes a single large grid of threads
to process an entire array in a single pass. This resulted in launches that
looked like:
template<typename F>
void TaskSingular(F f, vtkm::Id end)
const vtkm::Id index = static_cast<vtkm::Id>(blockDim.x * blockIdx.x + threadIdx.x);
if (index < end)
Schedule1DIndexKernel<TaskSingular><<<totalBlocks, 128, 0, cudaStreamPerThread>>>(
functor, numInstances);
This was problematic as it had the drawbacks of:
- Not being able to reuse any infrastructure between kernel executions.
- Harder to tune performance based on the current hardware.
The solution was to move to a grid stride loop strategy with a block size
based off the number of SM's on the executing GPU. The result is something
that looks like:
template<typename F>
void TaskStrided(F f, vtkm::Id end)
const vtkm::Id start = blockIdx.x * blockDim.x + threadIdx.x;
const vtkm::Id inc = blockDim.x * gridDim.x;
for (vtkm::Id index = start; index < end; index += inc)
Schedule1DIndexKernel<TaskStrided><<<32*numSMs, 128, 0, cudaStreamPerThread>>>(
functor, numInstances);
With a loop stride equal to grid size we maintain the optimal memory
coalescing patterns as we had with the monolithic version. These changes
also allow VTK-m to optimize TaskStrided so that it can reuse infrastructure
between iterations.
#DeviceAdapterId has becomes a real constexpr type and not an alias to vtkm::UInt8
As part of the ability to support `vtkm::cont::TryExecuteOnDevice` VTK-m has made the
DeviceAdapterId a real constexpr type instead of a vtkm::UInt8.
The benefits of a real type are as follows:
- Easier to add functionality like range verification, which previously had
to be located in each user of `DeviceAdapterId`
- In ability to have ambiguous arguments. Previously it wasn't perfectly clear
what a method parameter of `vtkm::UInt8` represented. Was it actually the
DeviceAdapterId or something else?
- Ability to add subclasses that represent things such as Undefined, Error, or Any.
The implementation of DeviceAdapterId is: