Commit 774d7a56 authored by Robert Maynard's avatar Robert Maynard
Browse files

Add release notes for v1.4.0

parent 0eac06f5
This diff is collapsed.
## `StorageBasic` StealArray() now provides delete function to new owner
Memory that is stolen from VTK-m has to be freed correctly. This is required
as the memory could have been allocated with `new`, `malloc` or even `cudaMallocManaged`.
Previously it was very easy to transfer ownership of memory out of VTK-m and
either fail to capture the free function, or ask for it after the transfer
operation which would return a nullptr. Now stealing an array also
provides the free function reducing one source of memory leaks.
To properly steal memory from VTK-m you do the following:
```cpp
vtkm::cont::ArrayHandle<T> arrayHandle;
...
auto* stolen = arrayHandle.StealArray();
T* ptr = stolen.first;
auto free_function = stolen.second;
...
free_function(ptr);
```
# VariantArrayHandle::AsVirtual<T>() performs casting
The AsVirtual<T> method of VariantArrayHandle now works for any arithmetic type,
not just the actual type of the underlying array. This works by inserting an
ArrayHandleCast between the underlying concrete array and the new
ArrayHandleVirtual when needed.
# VTK-m logs details about each CUDA kernel launch
The VTK-m logging infrastructure has been extended with a new log level
`KernelLaunches` which exists between `MemTransfer` and `Cast`.
This log level reports the number of blocks, threads per block, and the
PTX version of each CUDA kernel launched.
This logging level was primarily introduced to help developers that are
tracking down issues that occur when VTK-m components have been built with
different `sm_XX` flags and help people looking to do kernel performance
tuning.
# VTK-m provides a vtkm_filter target
VTK-m now provides a `vtkm_filter` that contains pre-built components
of filters for consuming projects.
# Make ArrayHandleVirtual conform with other ArrayHandle structure
Previously, ArrayHandleVirtual was defined as a specialization of
ArrayHandle with the virtual storage tag. This was because the storage
object was polymorphic and needed to be handled special. These changes
moved the existing storage definition to an internal class, and then
managed the pointer to that implementation class in a Storage object that
can be managed like any other storage object.
Also moved the implementation of StorageAny into the implementation of the
internal storage object.
# Add vtkm::cont::ArrayHandleVirtual
Added a new class named `ArrayHandleVirtual` that allows you to type erase an
ArrayHandle storage type by using virtual calls. This simplification makes
storing `Fields` and `Coordinates` significantly easier as VTK-m doesn't
need to deduce both the storage and value type when executing worklets.
To construct an `ArrayHandleVirtual` one can do one of the following:
```cpp
vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
// constrcut from an array handle
vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v(pressure);
// or assign from an array handle
v = constant;
```
To help maintain performance `ArrayHandleVirtual` provides a collection of helper
functions/methods to query and cast back to the concrete storage and value type:
```cpp
vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = constant;
bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
if(isConstant)
vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
```
Lastly, a common operation of calling code using `ArrayHandleVirtual` is a desire to construct a new instance
of an existing virtual handle with the same storage type. This can be done by using the `NewInstance` method
as seen below
```cpp
vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = pressure;
vtkm::cont::ArrayHandleVirtual<vtkm::Float32> newArray = v->NewInstance();
bool isConstant = vtkm::cont::IsType< vtkm::cont::ArrayHandle<vtkm::Float32> >(newArray); //will be true
```
# vtkm::cont::ArrayHandleZip provides a consistent API even with non-writable handles
Previously ArrayHandleZip could not wrap an implicit handle and provide a consistent experience.
The primary issue was that if you tried to use the PortalType returned by GetPortalControl() you
would get a compile failure. This would occur as the PortalType returned would try to call `Set`
on an ImplicitPortal which doesn't have a set method.
Now with this change, the `ZipPortal` use SFINAE to determine if `Set` and `Get` should call the
underlying zipped portals.
# Introduce asynchronous and device independent timer
The timer class now is asynchronous and device independent. it's using an
similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
and GetElapsedTime() function. For convenience and backward compability, Each
Start() function call will call Reset() internally. GetElapsedTime() function
can be used multiple times to time sequential operations and Stop() function
can be helpful when you want to get the elapsed time latter.
Bascially it can be used in two modes:
* Create a Timer without any device info.
* It would enable the timer for all enabled devices on the machine. Users can get a
specific elapsed time by passing a device id into the GetElapsedTime function.
If no device is provided, it would pick the maximum of all timer results - the
logic behind this decision is that if cuda is disabled, openmp, serial and tbb
roughly give the same results; if cuda is enabled it's safe to return the
maximum elapsed time since users are more interested in the device execution
time rather than the kernal launch time. The Ready function can be handy here
to query the status of the timer.
``` Construct a generic timer
// Assume CUDA is enabled on the machine
vtkm::cont::Timer timer;
timer.Start();
// Run the algorithm
auto timeHost = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagSerial());
// To avoid the expensive device synchronization, we query is ready here.
if (timer.IsReady())
{
auto timeDevice = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagCuda());
}
// Force the synchronization. Ideally device execution time would be returned
which takes longer time than ther kernal call
auto timeGeneral = timer.GetElapsedTime();
```
* Create a Timer with a specific device.
* It works as the old timer that times for a specific device id.
``` Construct a device specific timer
// Assume TBB is enabled on the machine
vtkm::cont::Timer timer{vtkm::cont::DeviceAdaptertagTBB()};
timer.Start(); // t0
// Run the algorithm
// Timer would just return 0 and warn the user in the logger that an invalid
// device is used to query elapsed time
auto timeInvalid = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagSerial());
if timer.IsReady()
{
// Either will work and mark t1, return t1-t0
auto time1TBB = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagTBB());
auto time1General = timer.GetElapsedTime();
}
// Do something
auto time2 = timer.GetElapsedTime(); // t2 will be marked and t2-t0 will be returned
// Do something
timer.Stop() // t3 marked
// Do something then summarize latter
auto timeFinal = timer.GetElapsedTime(); // t3-t0
```
# Add support for BitFields.
BitFields are:
- Stored in memory using a contiguous buffer of bits.
- Accessible via portals, a la ArrayHandle.
- Portals operate on individual bits or words.
- Operations may be atomic for safe use from concurrent kernels.
The new BitFieldToUnorderedSet device algorithm produces an
ArrayHandle containing the indices of all set bits, in no particular
order.
The new AtomicInterface classes provide an abstraction into bitwise
atomic operations across control and execution environments and are
used to implement the BitPortals.
BitFields may be used as boolean-typed ArrayHandles using the
ArrayHandleBitField adapter. ArrayHandleBitField uses atomic operations to read
and write bits in the BitField, and is safe to use in concurrent code.
For example, a simple worklet that merges two arrays based on a boolean
condition is tested in TestingBitField:
```
class ConditionalMergeWorklet : public vtkm::worklet::WorkletMapField
{
public:
using ControlSignature = void(FieldIn cond,
FieldIn trueVals,
FieldIn falseVals,
FieldOut result);
using ExecutionSignature = _4(_1, _2, _3);
template <typename T>
VTKM_EXEC T operator()(bool cond, const T& trueVal, const T& falseVal) const
{
return cond ? trueVal : falseVal;
}
};
BitField bits = ...;
auto condArray = vtkm::cont::make_ArrayHandleBitField(bits);
auto trueArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(20, 2, NUM_BITS);
auto falseArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(13, 2, NUM_BITS);
vtkm::cont::ArrayHandle<vtkm::Id> output;
vtkm::worklet::DispatcherMapField<ConditionalMergeWorklet> dispatcher;
dispatcher.Invoke(condArray, trueArray, falseArray, output);
```
# Put CellLocatorBoundingIntervalHierarchy in vtkm_cont library
All of the methods in CellLocatorBoundingIntervalHierarchy were listed in
header files. This is sometimes problematic with virtual methods. Since
everything implemented in it can just be embedded in a library, move the
code into the vtkm_cont library.
These changes caused some warnings in clang to show up based on virtual
methods in other cell locators. Hence, the rest of the cell locators
have also had some of their code moved to vtkm_cont.
# VTK-m `vtkm::cont::DeviceAdapterId` construction from string are now case-insensitive
You can now construct a `vtkm::cont::DeviceAdapterId` from a string no matter
the case of it. The following all will construct the same `vtkm::cont::DeviceAdapterId`.
```cpp
vtkm::cont::DeviceAdapterId id1 = vtkm::cont::make_DeviceAdapterId("cuda");
vtkm::cont::DeviceAdapterId id2 = vtkm::cont::make_DeviceAdapterId("CUDA");
vtkm::cont::DeviceAdapterId id3 = vtkm::cont::make_DeviceAdapterId("Cuda");
auto& tracker = vtkm::cont::GetGlobalRuntimeDeviceTracker();
vtkm::cont::DeviceAdapterId id4 = tracker.GetDeviceAdapterId("cuda");
vtkm::cont::DeviceAdapterId id5 = tracker.GetDeviceAdapterId("CUDA");
vtkm::cont::DeviceAdapterId id6 = tracker.GetDeviceAdapterId("Cuda");
# Allow VariantArrayHandle CastAndCall to cast to concrete types
Previously, the `VariantArrayHandle::CastAndCall` (and indirect calls through
`vtkm::cont::CastAndCall`) attempted to cast to only
`vtkm::cont::ArrayHandleVirtual` with different value types. That worked, but
it meant that whatever was called had to operate through virtual functions.
Under most circumstances, it is worthwhile to also check for some common
storage types that, when encountered, can be accessed much faster. This
change provides the casting to concrete storage types and now uses
`vtkm::cont::ArrayHandleVirtual` as a fallback when no concrete storage
type is found.
By default, `CastAndCall` checks all the storage types in
`VTKM_DEFAULT_STORAGE_LIST_TAG`, which typically contains only the basic
storage. The `ArrayHandleVirtual::CastAndCall` method also allows you to
override this behavior by specifying a different type list in the first
argument. If the first argument is a list type, `CastAndCall` assumes that
all the types in the list are storage tags. If you pass in
`vtkm::ListTagEmpty`, then `CastAndCall` will always cast to an
`ArrayHandleVirtual` (the previous behavior). Alternately, you can pass in
storage tags that might be likely under the current usage.
As an example, consider the following simple code.
``` cpp
vtkm::cont::VariantArrayHandle array;
// stuff happens
array.CastAndCall(myFunctor);
```
Previously, `myFunctor` would be called with
`vtkm::cont::ArrayHandleVirtual<T>` with different type `T`s. After this
change, `myFunctor` will be called with that and with
`vtkm::cont::ArrayHandle<T>` of the same type `T`s.
If you want to only call `myFunctor` with
`vtkm::cont::ArrayHandleVirtual<T>`, then replace the previous line with
``` cpp
array.CastAndCall(vtkm::ListTagEmpty(), myFunctor);
```
Let's say that additionally using `vtkm::cont::ArrayHandleIndex` was also
common. If you want to also specialize for that array, you can do so with
the following line.
``` cpp
array.CastAndCall(vtkm::ListTagBase<vtkm::cont::StorageBasic,
vtkm::cont::ArrayHandleIndex::StorageTag>,
myFunctor);
```
Note that `myFunctor` will be called with
`vtkm::cont::ArrayHandle<T,vtkm::cont::ArrayHandleIndex::StorageTag>`, not
`vtkm::cont::ArrayHandleIndex`.
# CMake 3.8 Required to build VTK-m
While VTK-m has always required a fairly recent version
of CMake when building for Visual Studio, or if OpenMP or
CUDA are enabled, it has supported building with the TBB
device with CMake 3.3.
Given the fact that our primary consumer (VTK) has moved
to require CMake 3.8, it doesn't make sense to require
CMake 3.3 and we have moved to a minimum of 3.8.
# Add connected component worklets and filters
We have added the `ImageConnectivity` and `CellSetConnectivity` worklets and
the corresponding filters to identify connected components in DataSet. The ImageConnectivity
identify connected components in CellSetStructured, based on same field value of neighboring
cells and the CellSetConnective identify connected components based on cell connectivity.
Currently Moore neighborhood (i.e. 8 neighboring pixels for 2D and 27 neighboring pixels
for 3D) is used for ImageConnectivity. For CellSetConnectivity, neighborhood is defined
as cells sharing a common edge.
# CudaAllocator Managed Memory can be disabled from C++
Previously it was impossible for calling code to explicitly
disable managed memory. This can be desirable for projects
that know they don't need managed memory and are super
performance critical.
# VTK-m now requires CUDA separable compilation to build
With the introduction of `vtkm::cont::ArrayHandleVirtual` and the related infrastructure, vtk-m now
requires that all CUDA code be compiled using separable compilation ( -rdc ).
# Remove templates from ControlSignature field tags
Previously, several of the `ControlSignature` tags had a template to
specify a type list. This was to specify potential valid value types for an
input array. The importance of this typelist was to limit the number of
code paths created when resolving a `vtkm::cont::VariantArrayHandle`
(formerly a `DynamicArrayHandle`). This (potentially) reduced the compile
time, the size of libraries/executables, and errors from unexpected types.
Much has changed since this feature was originally implemented. Since then,
the filter infrastructure has been created, and it is through this that
most dynamic worklet invocations happen. However, since the filter
infrastrcture does its own type resolution (and has its own policies) the
type arguments in `ControlSignature` are now of little value.
## Script to update code
This update requires changes to just about all code implementing a VTK-m
worklet. To facilitate the update of this code to these new changes (not to
mention all the code in VTK-m) a script is provided to automatically remove
these template parameters from VTK-m code.
This script is at
[Utilities/Scripts/update-control-signature-tags.sh](../../Utilities/Scripts/update-control-signature-tags.sh).
It needs to be run in a Unix-compatible shell. It takes a single argument,
which is a top level directory to modify files. The script processes all C++
source files recursively from that directory.
## Selecting data types for auxiliary filter fields
The main rational for making these changes is that the types of the inputs
to worklets is almost always already determined by the calling filter.
However, although it is straightforward to specify the type of the "main"
(active) scalars in a filter, it is less clear what to do for additional
fields if a filter needs a second or third field.
Typically, in the case of a second or third field, it is up to the
`DoExecute` method in the filter implementation to apply a policy to that
field. When applying a policy, you give it a policy object (nominally
passed by the user) and a traits of the filter. Generally, the accepted
list of types for a field should be part of the filter's traits. For
example, consider the `WarpVector` filter. This filter only works on
`Vec`s of size 3, so its traits class looks like this.
``` cpp
template <>
class FilterTraits<WarpVector>
{
public:
// WarpVector can only applies to Float and Double Vec3 arrays
using InputFieldTypeList = vtkm::TypeListTagFieldVec3;
};
```
However, the `WarpVector` filter also requires two fields instead of one.
The first (active) field is handled by its superclass (`FilterField`), but
the second (auxiliary) field must be managed in the `DoExecute`. Generally,
this can be done by simply applying the policy with the filter traits.
## The corner cases
Most of the calls to worklets happen within filter implementations, which
have their own way of narrowing down potential types (as previously
described). The majority of the remainder either use static types or work
with a variety of types.
However, there is a minority of corner cases that require a reduction of
types. Since the type argument of the worklet `ControlSignature` arguments
are no longer available, the narrowing of types must be done before the
call to `Invoke`.
This narrowing of arguments is not particularly difficult. Such type-unsure
arguments usually come from a `VariantArrayHandle` (or something that uses
one). You can select the types from a `VariantArrayHandle` simply by using
the `ResetTypes` method. For example, say you know that a variant array is
supposed to be a scalar.
``` cpp
dispatcher.Invoke(variantArray.ResetTypes(vtkm::TypeListTagFieldScalar()),
staticArray);
```
Even more common is to have a `vtkm::cont::Field` object. A `Field` object
internally holds a `VariantArrayHandle`, which is accessible via the
`GetData` method.
``` cpp
dispatcher.Invoke(field.GetData().ResetTypes(vtkm::TypeListTagFieldScalar()),
staticArray);
```
## Change in executable size
The whole intention of these template parameters in the first place was to
reduce the number of code paths compiled. The hypothesis of this change was
that in the current structure the code paths were not being reduced much
if at all. If that is true, the size of executables and libraries should
not change.
Here is a recording of the library and executable sizes before this change
(using `ds -h`).
```
3.0M libvtkm_cont-1.2.1.dylib
6.2M libvtkm_rendering-1.2.1.dylib
312K Rendering_SERIAL
312K Rendering_TBB
22M Worklets_SERIAL
23M Worklets_TBB
22M UnitTests_vtkm_filter_testing
5.7M UnitTests_vtkm_cont_serial_testing
6.0M UnitTests_vtkm_cont_tbb_testing
7.1M UnitTests_vtkm_cont_testing
```
After the changes, the executable sizes are as follows.
```
3.0M libvtkm_cont-1.2.1.dylib
6.0M libvtkm_rendering-1.2.1.dylib
312K Rendering_SERIAL
312K Rendering_TBB
21M Worklets_SERIAL
21M Worklets_TBB
22M UnitTests_vtkm_filter_testing
5.6M UnitTests_vtkm_cont_serial_testing
6.0M UnitTests_vtkm_cont_tbb_testing
7.1M UnitTests_vtkm_cont_testing
```
As we can see, the built sizes have not changed significantly. (If
anything, the build is a little smaller.)
# VTK-m CUDA kernel scheduling including improved defaults, and user customization
VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
When VTK-m first launches a kernel we do system introspection and determine
what GPU's are on the machine and than match this information to a preset
table of values. The implementation is designed in a way that allows for
VTK-m to offer both specific presets for a given GPU ( V100 ) or for
an entire generation of cards ( Pascal ).
Currently VTK-m offers preset tables for the following GPU's:
- Tesla V100
- Tesla P100
If the hardware doesn't match a specific GPU card we than try to find the
nearest know hardware generation and use those defaults. Currently we offer
defaults for
- Older than Pascal Hardware
- Pascal Hardware
- Volta+ Hardware
Some users have workloads that don't align with the defaults provided by
VTK-m. When that is the cause, it is possible to override the defaults
by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
As shown below:
```cpp
ScheduleParameters CustomScheduleValues(char const* name,
int major,
int minor,
int multiProcessorCount,
int maxThreadsPerMultiProcessor,
int maxThreadsPerBlock)
{
ScheduleParameters params {
64 * multiProcessorCount, //1d blocks
64, //1d threads per block
64 * multiProcessorCount, //2d blocks
{ 8, 8, 1 }, //2d threads per block
64 * multiProcessorCount, //3d blocks
{ 4, 4, 4 } }; //3d threads per block
return params;
}
vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
```
# vtkm::cont::Initialize
A new initialization function, vtkm::cont::Initialize, has been added.
Initialization is not required, but will configure the logging utilities (when
enabled) and allows forcing a device via a `-d` or `--device` command line
option.
Usage:
```
#include <vtkm/cont/Initialize.h>
int main(int argc, char *argv[])
{
auto config = vtkm::cont::Initialize(argc, argv);
...
}
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment