Commit 81ac8771 authored by Robert Maynard's avatar Robert Maynard

Merge branch 'release_notes_1.3.0'

parents b6b123e9 57127195
# 0-sample-topic
This is a sample release note for the change in a topic.
Developers should add similar notes for each topic branch
making a noteworthy change. Each document should be named
and titled to match the topic name to avoid merge conflicts.
This diff is collapsed.
# Add float version operations for vtkm::Math Pi()
Now PI related functions are evalulated at compile time as constexpr functions.
It also removes the old static_cast<T>vtkm::Pi() usages with
template ones and fix several conversion warnings.
# Add a release resources API to CellSet and its derived classes
We now offer the ability to unload execution memory from CellSet and its derived
classes(CellSetExplicit, CellSetPermutation and CellSetStructured) using the ReleaseResourcesExecution.
# TryExecuteOnDevice allows for runtime selection of which device to execute on
VTK-m now offers `vtkm::cont::TryExecuteOnDevice` to allow for the user to select
which device to execute a function on at runtime. The original `vtkm::cont::TryExecute`
used the first valid device, which meant users had to modify the runtime state
through the `RuntimeTracker` which was verbose and unwieldy.
Here is an example of how you can execute a function on the device that an array handle was last executed
on:
```cpp
struct ArrayCopyFunctor
{
template <typename Device, typename InArray, typename OutArray>
VTKM_CONT bool operator()(Device, const InArray& src, OutArray& dest)
{
vtkm::cont::DeviceAdapterAlgorithm<Device>::Copy(src, dest);
return true;
}
};
template<typename T, typename InStorage, typename OutStorage>
void SmartCopy(const vtkm::cont::ArrayHandle<T, InStorage>& src, vtkm::cont::ArrayHandle<T, OutStorage>& dest)
{
bool success = vtkm::cont::TryExecuteOnDevice(devId, ArrayCopyFunctor(), src, dest);
if (!success)
{
vtkm::cont::TryExecute(ArrayCopyFunctor(), src, dest);
}
}
```
# vtkm::cont::Algorithm now can be told which device to use at runtime
The `vtkm::cont::Algorithm` has been extended to support the user specifying
which device to use at runtime previously Algorithm would only use the first
enabled device, requiring users to modify the `vtkm::cont::GlobalRuntimeDeviceTracker`
if they wanted a specific device used.
To select a specific device with vtkm::cont::Algorithm pass the `vtkm::cont::DeviceAdapterId`
as the first parameter.
```cpp
vtkm::cont::ArrayHandle<double> values;
//call with no tag, will run on first enabled device
auto result = vtkm::cont::Algorithm::Reduce(values, 0.0);
//call with an explicit device tag, will only run on serial
vtkm::cont::DeviceAdapterTagSerial serial;
result = vtkm::cont::Algorithm::Reduce(serial, values, 0.0);
//call with an runtime device tag, will only run on serial
vtkm::cont::DeviceAdapterId device = serial;
result = vtkm::cont::Algorithm::Reduce(device, values, 0.0);
```
# Add a common API for CoordinateSystem to unload execution resources
We now offer the ability to unload execution memory from ArrayHandleVirtualCoordinate
and CoordinateSystem using the ReleaseResourcesExecution method.
Field now has a ReleaseResourcesExecution.
This commit also fixes a bug that ArrayTransfer of ArrayHandleVirtualCoordinate
does not release execution resources properly.
#Allow histogram filter to take custom types
By passing TypeList and StorageList type into FieldRangeGlobalCompute,
upstream users(VTK) can pass custom types into the histogram filter.
Allow disabling/enabling of CUDA managed memory through a environment variable
By setting the environment variable "VTKM_MANAGEDMEMO_DISABLED" to be 1,
users are able to disable CUDA managed memory even though the hardware is capable
of doing so.
# VTK-m ArrayHandle can now take ownership of a user allocated memory location
Previously memory that was allocated outside of VTK-m was impossible to transfer to
VTK-m as we didn't know how to free it. By extending the ArrayHandle constructors
to support a Storage object that is being moved, we can clearly express that
the ArrayHandle now owns memory it didn't allocate.
Here is an example of how this is done:
```cpp
T* buffer = new T[100];
auto user_free_function = [](void* ptr) { delete[] static_cast<T*>(ptr); };
vtkm::cont::internal::Storage<T, vtkm::cont::StorageTagBasic>
storage(buffer, 100, user_free_function);
vtkm::cont::ArrayHandle<T> arrayHandle(std::move(storage));
```
# Allow ArrayHandleTransform to work with ExecObject
Previously, the `ArrayHandleTransform` class only worked with plain old
data (POD) objects as is functors. For simple transforms, this makes sense
since all the data comes from a target `ArrayHandle` that will be sent to
the device through a different path. However, this also requires the
transform to be known at compile time.
However, there are cases where the functor cannot be a POD object and has
to be built for a specific device. There are numerous reasons for this. One
might be that you need some lookup tables. Another might be you want to
support a virtual object, which has to be initialized for a particular
device. The standard way to implement this in VTK-m is to create an
"executive object." This actually means that we create a wrapper around
executive objects that inherits from
`vtkm::cont::ExecutionAndControlObjectBase` that contains a
`PrepareForExecution` method and a `PrepareForControl` method.
As an example, consider the use case of a special `ArrayHandle` that takes
the value in one array and returns the index of that value in another
sorted array. We can do that by creating a functor that finds a value in an
array and returns the index.
``` cpp
template <typename ArrayPortalType>
struct FindValueFunctor
{
ArrayPortalType SortedArrayPortal;
FindValueFunctor() = default;
VTKM_CONT FindValueFunctor(const ArrayPortalType& sortedPortal)
: SortedArrayPortal(sortedPortal)
{ }
VTKM_EXEC vtkm::Id operator()(const typename PortalType::ValueType& value)
{
vtkm::Id leftIndex = 0;
vtkm::Id rightIndex = this->SortedArrayPortal.GetNubmerOfValues();
while (leftIndex < rightIndex)
{
vtkm::Id middleIndex = (leftIndex + rightIndex) / 2;
auto middleValue = this->SortedArrayPortal.Get(middleIndex);
if (middleValue <= value)
{
rightIndex = middleValue;
}
else
{
leftIndex = middleValue + 1;
}
}
return leftIndex;
}
};
```
Simple enough, except that the type of `ArrayPortalType` depends on what
device the functor runs on (not to mention its memory might need to be
moved to different hardware). We can now solve this problem by creating a
functor objecgt set this up for a device. `ArrayHandle`s also need to be
able to provide portals that run in the control environment, and for that
we need a special version of the functor for the control environment.
``` cpp
template <typename ArrayHandleType>
struct FindValueExecutionObject : vtkm::cont::ExecutionAndControlObjectBase
{
VTKM_IS_ARRAY_HANDLE(ArrayHandleType);
ArrayHandleType SortedArray;
FindValueExecutionObject() = default;
VTKM_CONT FindValueExecutionObject(const ArrayHandleType& sortedArray)
: SortedArray(sortedArray)
{ }
template <typename Device>
VTKM_CONT
FindValueFunctor<decltype(std::declval<FunctorType>()(Device()))>
PrepareForExecution(Device device)
{
using FunctorType =
FindValueFunctor<decltype(std::declval<FunctorType>()(Device()))>
return FunctorType(this->SortedArray.PrepareForInput(device));
}
VTKM_CONT
FundValueFunctor<typename ArrayHandleType::PortalConstControl>
PrepareForControl()
{
using FunctorType =
FindValueFunctor<typename ArrayHandleType::PortalConstControl>
return FunctorType(this->SortedArray.GetPortalConstControl());
}
}
```
Now you can use this execution object in an `ArrayHandleTransform`. It will
automatically be detected as an execution object and be converted to a
functor in the execution environment.
``` cpp
auto transformArray =
vtkm::cont::make_ArrayHandleTransform(
inputArray, FindValueExecutionObject<decltype(sortedArray)>(sortedArray));
```
# Add `ArrayHandleView` fancy array
Added a new class named `ArrayHandleView` that allows you to get a subset
of an array. You use the `ArrayHandleView` by giving it a target array, a
starting index, and a length. Here is a simple example of usage:
``` cpp
vtkm::cont::ArrayHandle<vtkm::Id> sourceArray;
vtkm::cont::ArrayCopy(vtkm::cont::ArrayHandleIndex(10), sourceArray);
// sourceArray has [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
vtkm::cont::ArrayHandleView<vtkm::cont::ArrayHandle<vtkm::Id>>
viewArray(sourceArray, 3, 5);
// viewArray has [3, 4, 5, 6, 7]
```
There is also a convenience `make_ArraHandleView` function to create view
arrays. The following makes the same view array as before.
``` cpp
auto viewArray = vtkm::cont::make_ArrayHandleView(sourceArray, 3, 5);
```
# `ArrayHandleCompositeVector` simplified and made writable.
`ArrayHandleCompositeVector` is now easier to use, as its type has a more
straightforward definition: `ArrayHandleCompositeVector<Array1, Array2, ...>`.
Previously, a helper metaprogramming struct was needed to determine the type
of the array handle.
In addition, the new implementation supports both reading and writing, whereas
the original version was read-only.
Another notable change is that the `ArrayHandleCompositeVector` no longer
supports component extraction from the source arrays. While the previous version
could take a source array with a `vtkm::Vec` `ValueType` and use only a single
component in the output, the new version requires that all input arrays have
the same `ValueType`, which becomes the `ComponentType` of the output
`vtkm::Vec`.
When component extraction is needed, the classes `ArrayHandleSwizzle` and
`ArrayHandleExtractComponent` have been introduced to allow the previous
usecases to continue working efficiently.
# `ArrayHandleExtractComponent` target component is now set at runtime.
Rather than embedding the extracted component in a template parameter, the
extract operation is now defined at runtime.
This is easier to use and keeps compile times / sizes / memory requirements
down.
# `ArrayHandleSwizzle` component maps are now set at runtime.
Rather than embedding the component map in the template parameters, the swizzle
operation is now defined at runtime using a `vtkm::Vec<vtkm::IdComponent, N>`
that maps the input components to the output components.
This is easier to use and keeps compile times / sizes / memory requirements
down.
# Build System Redesign and new minimum CMake
VTK-m CMake buildsystem was redesigned to be more declarative for consumers.
This was done by moving away from the previous component design and instead
to explicit targets. Additionally VTK-m now uses the native CUDA support
introduced in CMake 3.8 and has the following minimum CMake versions:
- Visual Studio Generator requires CMake 3.11+
- CUDA support requires CMake 3.9+
- Otherwise CMake 3.3+ is supported
When VTK-m is found find_package it defines the following targets:
- `vtkm_cont`
- contains all common core functionality
- always exists
- `vtkm_rendering`
- contains all the rendering code
- exists only when rendering is enabled
- rendering also provides a `vtkm_find_gl` function
- allows you to find the GL (EGL,MESA,Hardware), GLUT, and GLEW
versions that VTK-m was built with.
VTK-m also provides targets that represent what device adapters it
was built to support. The pattern for these targets are `vtkm::<device>`.
Currently we don't provide a target for the serial device.
- `vtkm::tbb`
- Target that contains tbb related link information
implicitly linked to by `vtkm_cont` if tbb was enabled
- `vtkm::cuda`
- Target that contains cuda related link information
implicitly linked to by `vtkm_cont` if cuda was enabled
VTK-m can be built with specific CPU architecture vectorization/optimization flags.
Consumers of the project can find these flags by looking at the `vtkm_vectorization_flags`
target.
So a project that wants to build an executable that uses vtk-m would look like:
```cmake
cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
project(HellowWorld CXX)
#Find the VTK-m package.
#Will automatically enable the CUDA language if needed ( and bump CMake minimum )
find_package(VTKm REQUIRED)
add_executable(HelloWorld HelloWorld.cxx)
target_link_libraries(HelloWorld PRIVATE vtkm_cont)
if(TARGET vtkm::cuda)
set_source_files_properties(HelloWorld.cxx PROPERTIES LANGUAGE CUDA)
endif()
```
# Cell measure functions, worklet, and filter
VTK-m now provides free functions, a worklet, and a filter for computing
the integral measure of a cell (i.e., its arc length, area, or volume).
The free functions are located in `vtkm/exec/CellMeasure.h` and share the
same signature:
```c++
template<typename OutType, typename PointVecType>
OutType CellMeasure(
const vtkm::IdComponent& numPts,
const PointCoordVecType& pts,
CellShapeTag,
const vtkm::exec::FunctorBase& worklet);
```
The number of points argument is provided for cell-types such as lines,
which allow an arbitrary number of points per cell.
See the worklet for examples of their use.
The worklet is named `vtkm::worklet::CellMeasure` and takes a template
parameter that is a tag list of measures to include.
Cells that are not selected by the tag list return a measure of 0.
Some convenient tag lists are predefined for you:
+ `vtkm::ArcLength` will only compute the measure of cells with a 1-dimensional parameter-space.
+ `vtkm::Area` will only compute the measure of cells with a 2-dimensional parameter-space.
+ `vtkm::Volume` will only compute the measure of cells with a 3-dimensional parameter-space.
+ `vtkm::AllMeasures` will compute all of the above.
The filter version, named `vtkm::filter::CellMeasures` – plural since
it produces a cell-centered array of measures — takes the same template
parameter and tag lists as the worklet.
By default, the output array of measure values is named "measure" but
the filter accepts other names via the `SetCellMeasureName()` method.
The only cell type that is not supported is the polygon;
you must triangulate polygons before running this filter.
See the unit tests for examples of how to use the worklet and filter.
The cell measures are all signed: negative measures indicate that the cell is inverted.
Simplicial cells (points, lines, triangles, tetrahedra) cannot not be inverted
by definition and thus always return values above or equal to 0.0.
Negative values indicate either the order in which vertices appear in its connectivity
array is improper or the relative locations of the vertices in world coordinates
result in a cell with a negative Jacobian somewhere in its interior.
Finally, note that cell measures may return invalid (NaN) or infinite (Inf, -Inf)
values if the cell is poorly defined, e.g., has coincident vertices
or a parametric dimension larger than the space spanned by its world-coordinate
vertices.
The verdict mesh quality library was used as the source of the methods
for approximating the cell measures.
# CellSetExplicit now caches CellToPoint table when used with Invoke.
Issue #268 highlighted an issue where the expensive CellToPoint table
update was not properly cached when a CellSetExplicit was used with a
filter. This has been corrected by ensuring that the metadata
associated with the table survives shallow copying of the CellSet.
New methods are also added to check whether the CellToPoint table
exists, and also to reset it if needed (e.g. for benchmarking):
```
vtkm::cont::CellSetExplicit<> cellSet = ...;
// Check if the CellToPoint table has already been computed:
if (cellSet.HasConnectivity(vtkm::TopologyElementTagCell{},
vtkm::TopologyElementTagPoint{}))
{
// Reset it:
cellSet.ResetConnectivity(vtkm::TopologyElementTagCell{},
vtkm::TopologyElementTagPoint{});
}
```
# Make RuntimeDeviceInformation class template independent
By making RuntimeDeviceInformation class template independent, vtkm is able to detect
device info at runtime with a runtime specified deviceId. In the past it's impossible
because the CRTP pattern does not allow function overloading(compiler would complain
that DeviceAdapterRuntimeDetector does not have Exists() function defined).
# User defined execution objects now usable with runtime selection of device adapter
- Changed how Execution objects are created and passed from the cont environment to the execution environment. See chapter 13.9 on worklets in the user manual for details.
- Instead we will now fill out a class and call prepareForExecution() and create the execution object for the execution environment from this function. This way we do not have to template the class that extends `vtkm::cont::ExecutionObjectBase` on the device.
Example of new execution object:
```cpp
template <typename Device>
struct ExecutionObject
{
vtkm::Int32 Number;
};
struct TestExecutionObject : public vtkm::cont::ExecutionObjectBase
{
vtkm::Int32 Number;
template <typename Device>
VTKM_CONT ExecutionObject<Device> PrepareForExecution(Device) const
{
ExecutionObject<Device> object;
object.Number = this->Number;
return object;
}
};
```
# Use the strong typed enums for vtkm::cont::Field
By doing so, the compiler would not convert these enums into `int`s
which can cause some unexpected behavior.
# Add new option to VTKm_CUDA_Architecture
A new VTKm_CUDA_Architecture option called 'none' has been added. This will
disable all VTK-m generated cuda architecture flags, allowing the user to
specify their own custom flags.
Useful when VTK-m is used as a library in another project and the project wants
to use its own architecture flags.
# Worklets are now asynchronous in Cuda
Worklets are now fully asynchronous in the cuda backend. This means that
worklet errors are reported asynchronously. Existing errors are checked for
before invocation of a new worklet and at explicit synchronization points like
`DeviceAdapterAlgorithm<>::Synchronize()`.
An important effect of this change is that functions that are synchronization
points, like `ArrayHandle::GetPortalControl()` and
`ArrayHandle::GetPortalConstControl()`, may now throw exception for errors from
previously executed worklets.
Worklet invocations, synchronization and error reporting happen independtly
on different threads. Therefore, synchronization on one thread does not affect
any other threads.
# Add support for deferred freeing of cuda memory
A new function, `void CudaAllocator::FreeDeferred(void* ptr, std::size_t numBytes)` has
been added that can be used to defer the freeing of cuda memory to a later point.
This is useful because `cudaFree` causes a global sync across all cuda streams. This function
internally maintains a pool of to-be-freed pointers that are freed together when a
size threshold is reached. This way a number of global syncs are collected together at
one point.
# VTK-m Worklets now execute on Cuda using grid stride loops
Previously VTK-m Worklets used what is referred to as a monolithic kernel
pattern for worklet execution. This assumes a single large grid of threads
to process an entire array in a single pass. This resulted in launches that
looked like:
```cpp
template<typename F>
void TaskSingular(F f, vtkm::Id end)
{
const vtkm::Id index = static_cast<vtkm::Id>(blockDim.x * blockIdx.x + threadIdx.x);
if (index < end)
{
f(index);
}
}
Schedule1DIndexKernel<TaskSingular><<<totalBlocks, 128, 0, cudaStreamPerThread>>>(
functor, numInstances);
```
This was problematic as it had the drawbacks of:
- Not being able to reuse any infrastructure between kernel executions.
- Harder to tune performance based on the current hardware.
The solution was to move to a grid stride loop strategy with a block size
based off the number of SM's on the executing GPU. The result is something
that looks like:
```cpp
template<typename F>
void TaskStrided(F f, vtkm::Id end)
{
const vtkm::Id start = blockIdx.x * blockDim.x + threadIdx.x;
const vtkm::Id inc = blockDim.x * gridDim.x;
for (vtkm::Id index = start; index < end; index += inc)
{
f(index);
}
}
Schedule1DIndexKernel<TaskStrided><<<32*numSMs, 128, 0, cudaStreamPerThread>>>(
functor, numInstances);
```
With a loop stride equal to grid size we maintain the optimal memory
coalescing patterns as we had with the monolithic version. These changes
also allow VTK-m to optimize TaskStrided so that it can reuse infrastructure
between iterations.
#DeviceAdapterId has becomes a real constexpr type and not an alias to vtkm::UInt8
As part of the ability to support `vtkm::cont::TryExecuteOnDevice` VTK-m has made the
DeviceAdapterId a real constexpr type instead of a vtkm::UInt8.
The benefits of a real type are as follows:
- Easier to add functionality like range verification, which previously had
to be located in each user of `DeviceAdapterId`
- In ability to have ambiguous arguments. Previously it wasn't perfectly clear
what a method parameter of `vtkm::UInt8` represented. Was it actually the
DeviceAdapterId or something else?
- Ability to add subclasses that represent things such as Undefined, Error, or Any.
The implementation of DeviceAdapterId is:
```cpp
struct DeviceAdapterId
{
constexpr explicit DeviceAdapterId(vtkm::Int8 id)
: Value(id)
{
}
constexpr bool operator==(DeviceAdapterId other) const { return this->Value == other.Value; }
constexpr bool operator!=(DeviceAdapterId other) const { return this->Value != other.Value; }
constexpr bool operator<(DeviceAdapterId other) const { return this->Value < other.Value; }
constexpr bool IsValueValid() const
{
return this->Value > 0 && this->Value < VTKM_MAX_DEVICE_ADAPTER_ID;
}
constexpr vtkm::Int8 GetValue() const { return this->Value; }
private:
vtkm::Int8 Value;
};
```
# DeviceAdapterTags are usable for runtime device selection
VTK-m DeviceAdapterTags now are both a compile time representation of which device to use, and
also the runtime representation of that device. Previously the runtime representation was handled
by `vtkm::cont::DeviceAdapterId`. This was done by making `DeviceAdapterTag`'s' a constexpr type that
inherits from the constexpr `vtkm::cont::DeviceAdapterId` type.
At at ten thousand foot level this change means that in general instead of using `vtkm::cont::DeviceAdapterTraits<DeviceTag>`
you can simply use `DeviceTag`, or an instance of if `DeviceTag runtimeDeviceId;`.
Previously if you wanted to get the runtime representation of a device you would do the following:
```cpp
template<typename DeviceTag>
vtkm::cont::DeviceAdapterId getDeviceId()
{
using Traits = vtkm::cont::DeviceAdapterTraits<DeviceTag>;
return Traits::GetId();
}
...
vtkm::cont::DeviceAdapterId runtimeId = getDeviceId<DeviceTag>();
```
Now with the updates you could do the following.
```cpp
vtkm::cont::DeviceAdapterId runtimeId = DeviceTag();
```
More importantly this conversion is unnecessary as you can pass instances `DeviceAdapterTags` into methods or functions
that want `vtkm::cont::DeviceAdapterId` as they are that type!
Previously if you wanted to see if a DeviceAdapter was enabled you would the following:
```cpp
using Traits = vtkm::cont::DeviceAdapterTraits<DeviceTag>;
constexpr auto isValid = std::integral_constant<bool, Traits::Valid>();
```
Now you would do:
```cpp
constexpr auto isValid = std::integral_constant<bool, DeviceTag::IsEnabled>();
```
So why did VTK-m make these changes?
That is a good question, and the answer for that is two fold. The VTK-m project is working better support for ArraysHandles that leverage runtime polymorphism (aka virtuals), and the ability to construct `vtkm::worklet::Dispatchers` without specifying
the explicit device they should run on. Both of these designs push more of the VTK-m logic to operate at runtime rather than compile time. This changes are designed to allow for consistent object usage between runtime and compile time instead of having
to convert between compile time and runtime types.
# Make DispatcherBase invoke using a TryExecute
Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a worklet,
this commit also removes the need to declare a device in several other
areas of the code.
This changes touches quite a bit a code. The first pass of the change
usually does the minimum amount of work, which is to change the
compile-time specification of the device to a run-time call to `SetDevice`
on the dispatcher. Although functionally equivalent, it might mean calling
`TryExecute` within itself.
# VTK-m now supports dispatcher parameters being pointers
Previously it was only possible to pass values to a dispatcher when
you wanted to invoke a VTK-m worklet. This caused problems when it came
to designing new types that used inheritance as the types couldn't be
past as the base type to the dispatcher. To fix this issue we now
support invoking worklets with pointers as seen below.
```cpp
vtkm::cont::ArrayHandle<T> input;
//fill input
vtkm::cont::ArrayHandle<T> output;
vtkm::worklet::DispatcherMapField<WorkletType> dispatcher;
dispatcher(&input, output);