Commit c3db52da authored by Vicente Bolea's avatar Vicente Bolea 💬 Committed by Kitware Robot
Browse files

Merge topic 'update-to-1.6.0-rc1'

19c1b99f 1.6.0-rc1 is our 7th official release of VTK-m.
4e94d830 Add release notes for v1.6.0
9d345733 Merge commit 'd2d1c854' into update-to-1.6.0-rc1
74ffad9b 1.5.1 is a bugfix release for VTK-m.
f67dcfc3 Add release notes for v1.5.1
38af6dbf add Particle.h and Deprecated.h to the installed headers
d5f50c5a 3b7b21c8 Do not use std::is_trivially_copyable on GCC 4.X
cb3d6e5c

 Variant as trivially copyable
...
Acked-by: Kitware Robot's avatarKitware Robot <kwrobot@kitware.com>
Acked-by: Kenneth Moreland's avatarKenneth Moreland <kmorel@acm.org>
Merge-request: !2457
parents b00f5b44 19c1b99f
VTK-m 1.5.1 Release Notes
=======================
# Table of Contents
1. [Core](#Core)
- Variant trivial copy
2. [Build](#Build)
- GCC 4 trivially copyable
- MSVC flag fix
- GCC openmp workaround fix
- Correct gcc 9 warnings
- GCC 48 iterator fixes
- Aligned union check handle gcc 485
- Update check for aligned union
- Intel fix
- Correct msvc 2015 failure
- GCC 61 ice openmp and optimizations
- No deprecated nvcc vs
# Core
## Variant trivial copy
The Variant template can hold any type. If it is holding a type that is
non-copyable, then it has to make sure that appropriate constructors,
copiers, movers, and destructors are called.
Previously, these were called even the Variant was holding a trivially
copyable class because no harm no foul. If you were holding a trivially
copyable class and did a `memcpy`, that work work, which should make it
possible to copy between host and device, right?
In theory yes, but in practice no. The problem is that _Cuda_ is
outsmarting the code. It is checking that Variant is not trivially-
copyable by C++ semantics and refusing to push it.
So, change Variant to check to see if all its supported classes are
trivially copyable. If they are, then it use the default constructors,
destructors, movers, and copiers so that C++ recognizes it as trivially
copyable.
7518d067 Try to fix uninitialized anonymous variable warning
5b18ffd7 Register Variant as trivially copyable if possible
16305bd8 Add tests of ArrayHandleMultiplexer on multiple devices
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1898
# Build
## GCC-4 trivially copyable
Although GCC 4.8 and 4.9 claim to be C++11 compliant, there are a few
C++11 features they do not support. One of these features is
`std::is_trivially_copyable`. So on these platforms, do not attempt to use
it. Instead, treat nothing as trivially copyable.
3b7b21c8 Do not use std::is_trivially_copyable on GCC 4.X
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1900
## Fix MSVC flag
Fix MSVC flags for CUDA builds.
07b55a95 Fix MSVC flags for CUDA builds.
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1919
## GCC OpenMP workaround fix
There is some behavior of GCC compilers before GCC 9.0 that is
incompatible with the specification of `OpenMP` 4.0. The workaround was
using the workaround any time a GCC compiler >= 9.0 was used. The proper
behavior is to only use the workaround when the GCC compiler is being
used and the version of the compiler is less than 9.0.
Also, switch to using `VTKM_GCC` to check for the GCC compiler instead of
GNUC. The problem with using GNUC is that many other compilers
pretend to be GCC by defining this macro, but in cases like compiler
workarounds it is not accurate.
033dfe55 Only workaround incorrect GCC behavior for OpenMP on GCC
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1904
## Correct GCC-9 warnings
870bd1d1 Removed unnecessary increment and decrement from ZFPDecode
f9860b84 Correct warnings found by gcc-9 in vtkm::Particle
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1935
## GCC-4.8 iterator fixes
The draft C++11 spec that GCC-4.X implemented against had some
defects that made implementing `void_t<...>` tricky.
83d4d4e4 ArrayPortalToIterators now compiles with GCC-4.X
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1934
## Aligned union check handle GCC
b36846e4 UnitTestVariant uses VTKM_USING_GLIBCXX_4
cbf20ac3 Merge branch 'upstream-diy' into aligned_union_check_handle_gcc_485
ac1a23be diy 2019-12-17 (bb86e1f7)
201e5c81 Add the gcc 4.8.5 release date to our VTKM_USING_GLIBCXX_4 check
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1930
## Update check for aligned union
Fixes #447 (closed)
This uses a more robust set of checks to determine if `std::aligned_union`
and `std::is_trivially_copyable` exist given the `libstdc++` version value
2e48d98d Merge branch 'upstream-diy' into update_check_for_aligned_union
bbd5db31 diy 2019-12-16 (e365b66a)
269261b9 Handle compiling against 4.X versions of libstdc++
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1928
## Intel fix
b6b20f08 Use brigand integer sequences on icc.
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1923
## Correct MSVC 2015 failure
f89672b7 UnitTestFetchArrayTopologyMapIn now compiles with VS2015
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1921
## GCC-61 ice OpenMP and optimizations
dc86ac20 Avoid a GCC 6.1 compiler regression that occurs when openmp is enabled
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1950
## No deprecated NVCC VS
The NVCC compiler under visual studio seems to give the error attribute does
not apply to any entity when you try to use the [[deprecated]] attribute.
So disable for this compiler configuration.
fb01d38a Disable deprecated attribute when using nvcc under VS
Merge-request: https://gitlab.kitware.com/vtk/vtk-m/-/merge_requests/1949
This diff is collapsed.
# Remove VTKDataSetWriter::WriteDataSet just_points parameter
In the method `VTKDataSetWriter::WriteDataSet`, `just_points` parameter has been
removed due to lack of usage.
The purpose of `just_points` was to allow exporting only the points of a
DataSet without its cell data.
# Add Kokkos backend
Adds a new device backend `Kokkos` which uses the kokkos library for parallelism.
User must provide the kokkos build and Vtk-m will use the default configured execution
space.
# Extract component arrays from unknown arrays
One of the problems with the data structures of VTK-m is that non-templated
classes like `DataSet`, `Field`, and `UnknownArrayHandle` (formally
`VariantArrayHandle`) internally hold an `ArrayHandle` of a particular type
that has to be cast to the correct task before it can be reasonably used.
That in turn is problematic because the list of possible `ArrayHandle`
types is very long.
At one time we were trying to compensate for this by using
`ArrayHandleVirtual`. However, for technical reasons this class is
infeasible for every use case of VTK-m and has been deprecated. Also, this
was only a partial solution since using it still required different code
paths for, say, handling values of `vtkm::Float32` and `vtkm::Vec3f_32`
even though both are essentially arrays of 32-bit floats.
The extract component feature compensates for this problem by allowing you
to extract the components from an `ArrayHandle`. This feature allows you to
create a single code path to handle `ArrayHandle`s containing scalars or
vectors of any size. Furthermore, when you extract a component from an
array, the storage gets normalized so that one code path covers all storage
types.
## `ArrayExtractComponent`
The basic enabling feature is a new function named `ArrayExtractComponent`.
This function takes takes an `ArrayHandle` and an index to a component. It
then returns an `ArrayHandleStride` holding the selected component of each
entry in the original array.
We will get to the structure of `ArrayHandleStride` later. But the
important part is that `ArrayHandleStride` does _not_ depend on the storage
type of the original `ArrayHandle`. That means whether you extract a
component from `ArrayHandleBasic`, `ArrayHandleSOA`,
`ArrayHandleCartesianProduct`, or any other type, you get back the same
`ArrayHandleStride`. Likewise, regardless of whether the input
`ArrayHandle` has a `ValueType` of `FloatDefault`, `Vec2f`, `Vec3f`, or any
other `Vec` of a default float, you get the same `ArrayHandleStride`. Thus,
you can see how this feature can dramatically reduce code paths if used
correctly.
It should be noted that `ArrayExtractComponent` will (logically) flatten
the `ValueType` before extracting the component. Thus, nested `Vec`s such
as `Vec<Vec3f, 3>` will be treated as a `Vec<FloatDefault, 9>`. The
intention is so that the extracted component will always be a basic C type.
For the purposes of this document when we refer to the "component type", we
really mean the base component type.
Different `ArrayHandle` implementations provide their own implementations
for `ArrayExtractComponent` so that the component can be extracted without
deep copying all the data. We will visit how `ArrayHandleStride` can
represent different data layouts later, but first let's go into the main
use case.
## Extract components from `UnknownArrayHandle`
The principle use case for `ArrayExtractComponent` is to get an
`ArrayHandle` from an unknown array handle without iterating over _every_
possible type. (Rather, we iterate over a smaller set of types.) To
facilitate this, an `ExtractComponent` method has been added to
`UnknownArrayHandle`.
To use `UnknownArrayHandle::ExtractComponent`, you must give it the
component type. You can check for the correct component type by using the
`IsBaseComponentType` method. The method will then return an
`ArrayHandleStride` for the component type specified.
### Example
As an example, let's say you have a worklet, `FooWorklet`, that does some
per component operation on an array. Furthermore, let's say that you want
to implement a function that, to the best of your ability, can apply
`FooWorklet` on an array of any type. This function should be pre-compiled
into a library so it doesn't have to be compiled over and over again.
(`MapFieldPermutation` and `MapFieldMergeAverage` are real and important
examples that have this behavior.)
Without the extract component feature, the implementation might look
something like this (many practical details left out):
``` cpp
struct ApplyFooFunctor
{
template <typename ArrayType>
void operator()(const ArrayType& input, vtkm::cont::UnknownArrayHandle& output) const
{
ArrayType outputArray;
vtkm::cont::Invoke invoke;
invoke(FooWorklet{}, input, outputArray);
output = outputArray;
}
};
vtkm::cont::UnknownArrayHandle ApplyFoo(const vtkm::cont::UnknownArrayHandle& input)
{
vtkm::cont::UnknownArrayHandle output;
input.CastAndCallForTypes<vtkm::TypeListAll, VTKM_DEFAULT_STORAGE_LIST_TAG>(
ApplyFooFunctor{}, output);
return output;
}
```
Take a look specifically at the `CastAndCallForTypes` call near the bottom
of this example. It calls for all types in `vtkm::TypeListAll`, which is
about 40 instances. Then, it needs to be called for any type in the desired
storage list. This could include basic arrays, SOA arrays, and lots of
other specialized types. It would be expected for this code to generate
over 100 paths for `ApplyFooFunctor`. This in turn contains a worklet
invoke, which is not a small amount of code.
Now consider how we can use the `ExtractComponent` feature to reduce the
code paths:
``` cpp
struct ApplyFooFunctor
{
template <typename T>
void operator()(T,
const vtkm::cont::UnknownArrayHandle& input,
cont vtkm::cont::UnknownArrayHandle& output) const
{
if (!input.IsBasicComponentType<T>()) { return; }
VTKM_ASSERT(output.IsBasicComponentType<T>());
vtkm::cont::Invoke invoke;
invoke(FooWorklet{}, input.ExtractComponent<T>(), output.ExtractComponent<T>());
}
};
vtkm::cont::UnknownArrayHandle ApplyFoo(const vtkm::cont::UnknownArrayHandle& input)
{
vtkm::cont::UnknownArrayHandle output = input.NewInstanceBasic();
output.Allocate(input.GetNumberOfValues());
vtkm::cont::ListForEach(ApplyFooFunctor{}, vtkm::TypeListScalarAll{}, input, output);
return output;
}
```
The number of lines of code is about the same, but take a look at the
`ListForEach` (which replaces the `CastAndCallForTypes`). This calling code
takes `TypeListScalarAll` instead of `TypeListAll`, which reduces the
instances created from around 40 to 13 (every basic C type). It is also no
longer dependent on the storage, so these 13 instances are it. As an
example of potential compile savings, changing the implementation of the
`MapFieldMergePermutation` and `MapFieldMergeAverage` functions in this way
reduced the filters_common library (on Mac, Debug build) by 24 MB (over a
third of the total size).
Another great advantage of this approach is that even though it takes less
time to compile and generates less code, it actually covers more cases.
Have an array containg values of `Vec<short, 13>`? No problem. The values
were actually stored in an `ArrayHandleReverse`? It will still work.
## `ArrayHandleStride`
This functionality is made possible with the new `ArrayHandleStride`. This
array behaves much like `ArrayHandleBasic`, except that it contains an
_offset_ parameter to specify where in the buffer array to start reading
and a _stride_ parameter to specify how many entries to skip for each
successive entry. `ArrayHandleStride` also has optional parameters
`divisor` and `modulo` that allow indices to be repeated at regular
intervals.
Here are how `ArrayHandleStride` extracts components from several common
arrays. For each of these examples, we assume that the `ValueType` of the
array is `Vec<T, N>`. They are each extracting _component_.
### Extracting from `ArrayHandleBasic`
When extracting from an `ArrayHandleBasic`, we just need to start at the
proper component and skip the length of the `Vec`.
* _offset_: _component_
* _stride_: `N`
### Extracting from `ArrayHandleSOA`
Since each component is held in a separate array, they are densly packed.
Each component could be represented by `ArrayHandleBasic`, but of course we
use `ArrayHandleStride` to keep the type consistent.
* _offset_: 0
* _stride_: 1
### Extracting from `ArrayHandleCartesianProduct`
This array is the basic reason for implementing the _divisor_ and _modulo_
parameters. Each of the 3 components have different parameters, which are
the following (given that _dims_[3] captures the size of the 3 arrays for
each dimension).
* _offset_: 0
* _stride_: 1
* case _component_ == 0
* _divisor_: _ignored_
* _modulo_: _dims_[0]
* case _component_ == 1
* _divisor_: _dims_[0]
* _modulo_: _dims_[1]
* case _component_ == 2
* _divisor_: _dims_[0]
* _modulo_: _ignored_
### Extracting from `ArrayHandleUniformPointCoordinates`
This array cannot be represented directly because it is fully implicit.
However, it can be trivially converted to `ArrayHandleCartesianProduct` in
typically very little memory. (In fact, EAVL always represented uniform
point coordinates by explicitly storing a Cartesian product.) Thus, for
very little overhead the `ArrayHandleStride` can be created.
## Runtime overhead of extracting components
These benefits come at a cost, but not a large one. The "biggest" cost is
the small cost of computing index arithmetic for each access into
`ArrayHandleStride`. To make this as efficient as possible, there are
conditions that skip over the modulo and divide steps if they are not
necessary. (Integer modulo and divide tend to take much longer than
addition and multiplication.) It is for this reason that we probably do not
want to use this method all the time.
Another cost is the fact that not every `ArrayHandle` can be represented by
`ArrayHandleStride` directly without copying. If you ask to extract a
component that cannot be directly represented, it will be copied into a
basic array, which is not great. To make matters worse, for technical
reasons this copy happens on the host rather than the device.
# Create `ArrayHandleOffsetsToNumComponents`
`ArrayHandleOffsetsToNumComponents` is a fancy array that takes an array of
offsets and converts it to an array of the number of components for each
packed entry.
It is common in VTK-m to pack small vectors of variable sizes into a single
contiguous array. For example, cells in an explicit cell set can each have
a different amount of vertices (triangles = 3, quads = 4, tetra = 4, hexa =
8, etc.). Generally, to access items in this list, you need an array of
components in each entry and the offset for each entry. However, if you
have just the array of offsets in sorted order, you can easily derive the
number of components for each entry by subtracting adjacent entries. This
works best if the offsets array has a size that is one more than the number
of packed vectors with the first entry set to 0 and the last entry set to
the total size of the packed array (the offset to the end).
When packing data of this nature, it is common to start with an array that
is the number of components. You can convert that to an offsets array using
the `vtkm::cont::ConvertNumComponentsToOffsets` function. This will create
an offsets array with one extra entry as previously described. You can then
throw out the original number of components array and use the offsets with
`ArrayHandleOffsetsToNumComponents` to represent both the offsets and num
components while storing only one array.
This replaces the use of `ArrayHandleDecorator` in `CellSetExplicit`.
The two implementation should do the same thing, but the new
`ArrayHandleOffsetsToNumComponents` should be less complex for
compilers.
# `ArrayRangeCompute` works on any array type without compiling device code
Originally, `ArrayRangeCompute` required you to know specifically the
`ArrayHandle` type (value type and storage type) and to compile using any
device compiler. The method is changed to include only overloads that have
precompiled versions of `ArrayRangeCompute`.
Additionally, an `ArrayRangeCompute` overload that takes an
`UnknownArrayHandle` has been added. In addition to allowing you to compute
the range of arrays of unknown types, this implementation of
`ArrayRangeCompute` serves as a fallback for `ArrayHandle` types that are
not otherwise explicitly supported.
If you really want to make sure that you compute the range directly on an
`ArrayHandle` of a particular type, you can include
`ArrayRangeComputeTemplate.h`, which contains a templated overload of
`ArrayRangeCompute` that directly computes the range of an `ArrayHandle`.
Including this header requires compiling for device code.
# Implemented ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal
ArrayHandleRandomUniformBits and ArrayHandleRandomUniformReal were added to provide
an efficient way to generate pseudo random numbers in parallel. They are based on the
Philox parallel pseudo random number generator. ArrayHandleRandomUniformBits provides
64-bits random bits in the whole range of UInt64 as its content while
ArrayHandleRandomUniformReal provides random Float64 in the range of [0, 1). User can
either provide a seed in the form of Vec<vtkm::Uint32, 1> or use the default random
source provided by the C++ standard library. Both of the ArrayHandles are lazy evaluated
as other Fancy ArrayHandles such that they only have O(1) memory overhead. They are
stateless and functional and does not change once constructed. To generate a new set of
random numbers, for example as part of a iterative algorithm, a new ArrayHandle
needs to be constructed in each iteration. See the user's guide for more detail and
examples.
# `ArrayHandleGroupVecVariable` holds now one more offset.
This change affects the usage of both `ConvertNumComponentsToOffsets` and
`make_ArrayHandleGroupVecVariable`.
The reason of this change is to remove a branch in
`ArrayHandleGroupVecVariable::Get` which is used to avoid an array overflow,
this in theory would increases the performance since at the CPU level it will
remove penalties due to wrong branch predictions.
The change affects `ConvertNumComponentsToOffsets` by both:
1. Increasing the numbers of elements in `offsetsArray` (its second parameter)
by one.
2. Setting `sourceArraySize` as the sum of all the elements plus the new one
in `offsetsArray`
Note that not every specialization of `ConvertNumComponentsToOffsets` does
return `offsetsArray`. Thus, some of them would not be affected.
Similarly, this change affects `make_ArrayHandleGroupVecVariable` since it
expects its second parameter (offsetsArray) to be one element bigger than
before.
# Algorithms for Control and Execution Environments
The `<vtkm/Algorithms.h>` header has been added to provide common STL-style
generic algorithms that are suitable for use in both the control and execution
environments. This is necessary as the STL algorithms in the `<algorithm>`
header are not marked up for use in execution environments such as CUDA.
In addition to the markup, these algorithms have convenience overloads to
support ArrayPortals directly, simplifying their usage with VTK-m data
structures.
Currently, three related algorithms are provided: `LowerBounds`, `UpperBounds`,
and `BinarySearch`. `BinarySearch` differs from the STL `std::binary_search`
algorithm in that it returns an iterator (or index) to a matching element,
rather than just a boolean indicating whether a or not key is present.
The new algorithm signatures are:
```c++
namespace vtkm
{
template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT
IterT BinarySearch(IterT first, IterT last, const T& val, Comp comp);
template <typename IterT, typename T>
VTKM_EXEC_CONT
IterT BinarySearch(IterT first, IterT last, const T& val);
template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT
vtkm::Id BinarySearch(const PortalT& portal, const T& val, Comp comp);
template <typename PortalT, typename T>
VTKM_EXEC_CONT
vtkm::Id BinarySearch(const PortalT& portal, const T& val);
template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT
IterT LowerBound(IterT first, IterT last, const T& val, Comp comp);
template <typename IterT, typename T>
VTKM_EXEC_CONT
IterT LowerBound(IterT first, IterT last, const T& val);
template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT
vtkm::Id LowerBound(const PortalT& portal, const T& val, Comp comp);
template <typename PortalT, typename T>
VTKM_EXEC_CONT
vtkm::Id LowerBound(const PortalT& portal, const T& val);
template <typename IterT, typename T, typename Comp>
VTKM_EXEC_CONT
IterT UpperBound(IterT first, IterT last, const T& val, Comp comp);
template <typename IterT, typename T>
VTKM_EXEC_CONT
IterT UpperBound(IterT first, IterT last, const T& val);
template <typename PortalT, typename T, typename Comp>
VTKM_EXEC_CONT
vtkm::Id UpperBound(const PortalT& portal, const T& val, Comp comp);
template <typename PortalT, typename T>
VTKM_EXEC_CONT
vtkm::Id UpperBound(const PortalT& portal, const T& val);
}
```
# `vtkm::cont::internal::Buffer` now can have ownership transferred
Memory once transferred to `Buffer` always had to be managed by VTK-m. This is problematic
for applications that needed VTK-m to allocate memory, but have the memory ownership
be longer than VTK-m.
`Buffer::TakeHostBufferOwnership` allows for easy transfer ownership of memory out of VTK-m.
When taking ownership of an VTK-m buffer you are provided the following information:
- Memory: A `void*` pointer to the array
- Container: A `void*` pointer used to free the memory. This is necessary to support cases such as allocations transferred into VTK-m from a `std::vector`.
- Delete: The function to call to actually delete the transferred memory
- Reallocate: The function to call to re-allocate the transferred memory. This will throw an exception if users try
to reallocate a buffer that was 'view' only
- Size: The size in number of elements of the array
To properly steal memory from VTK-m you do the following:
```cpp
vtkm::cont::ArrayHandle<T> arrayHandle;
...
auto stolen = arrayHandle.GetBuffers()->TakeHostBufferOwnership();
...
stolen.Delete(stolen.Container);
```
# Redesign of ArrayHandle to access data using typeless buffers
The original implementation of `ArrayHandle` is meant to be very generic.
To define an `ArrayHandle`, you actually create a `Storage` class that
maintains the data and provides portals to access it (on the host). Because
the `Storage` can provide any type of data structure it wants, you also
need to define an `ArrayTransfer` that describes how to move the
`ArrayHandle` to and from a device. It also has to be repeated for every
translation unit that uses them.
This is a very powerful mechanism. However, one of the major problems with
this approach is that every `ArrayHandle` type needs to have a separate
compile path for every value type crossed with every device. Because of
this limitation, the `ArrayHandle` for the basic storage has a special
implementation that manages the actual data allocation and movement as
`void *` arrays. In this way all the data management can be compiled once
and put into the `vtkm_cont` library. This has dramatically improved the
VTK-m compile time.
This new design replicates the basic `ArrayHandle`'s success to all other
storage types. The basic idea is to make the implementation of