|
DIY
3.0
data-parallel out-of-core C++ library
|
DIY currently supports two communication patterns: neighborhood exchange and global reduction. Additional lightweight collectives can be executed over the communication proxy of the above patterns.
The basic building block of communication is the neighborhood. This is a local collection of edges in a communication graph from the current block to other blocks; information is exchanged over these edges. In DIY, the communication proxy is the object that encapsulates the neighborhood. The communication proxy includes the link, which is the collection of edges to the neighboring blocks. This size of the link is the number of edges. In the communication proxy, link->target[i] is the block connected to the ith edge. Edges can be added or removed dynamically. Data can be enqueued to and dequeued from any target in the link as desired, and then all enqueued data are exchanged over the link in one step.
DIY supports general reductions to implement complex global operations over blocks. These reductions are general-purpose (any global communication pattern can be implemented); they operate over blocks, which cycle in and out of core as necessary; the operations are (multi-)threaded automatically using the same mechanism as foreach(). Although any global communication can be expressed using the reduction mechanism, all the reductions included in DIY operate in rounds over a k-ary reduction tree. The value of k used in each round can vary, but if it's fixed, the number of rounds is log_k(nblocks).
The following patterns are currently available.
diy::RegularMergePartnersdiy::reducediy::RegularSwapPartnersdiy::reducediy::RegularAllReducePartnersdiy::reducediy:all_to_allWhen a lightweight reduction needs to be mixed into an existing pattern such as a neighborhood exchange, DIY has a mechanism for this. An example is a neighbor exchange that must iterate until the collective result indicates it is time to terminate (as in particle tracing in rounds until no block has any more work to do). The underlying mechanism works as follows.
The inputs are pushed by calling all_reduce from each block, and the outputs are popped by calling get from each block. The collective mechanism compares with the above reductions as follows:
maximum<T>, minimum<T>, std::plus<T>, std::multiplies<T>, std::logical_and<T>, and std::logical_or<T>.get function, not dequeue (unlike above)A code snippet is below. The complete example is here
Classes | |
| struct | diy::RegularAllReducePartners |
| Allreduce (reduction with results broadcasted to all blocks) is implemented as two merge reductions, with incoming and outgoing items swapped in second one. Ie, follows merge reduction up and down the merge tree. More... | |
| struct | diy::RegularBroadcastPartners |
| Partners for broadcast. More... | |
| struct | diy::RegularMergePartners |
| Partners for merge-reduce. More... | |
| struct | diy::RegularSwapPartners |
| Partners for swap-reduce. More... | |
Functions | |
| template<class T , class Op > | |
| void | diy::Master::Proxy::all_reduce (const T &in, Op op) const |
| Post an all-reduce collective using an existing communication proxy. Available operators are: maximum<T>, minimum<T>, std::plus<T>, std::multiplies<T>, std::logical_and<T>, and std::logical_or<T>. More... | |
| template<class T > | |
| T | diy::Master::Proxy::read () const |
| Return the result of a proxy collective without popping it off the collectives list (same result would be returned multiple times). The list can be cleared with collectives()->clear(). | |
| template<class T > | |
| T | diy::Master::Proxy::get () const |
| Return the result of a proxy collective; result is popped off the collectives list. | |
| CollectivesList * | diy::Master::Proxy::collectives () const |
| Return the list of proxy collectives (values and operations) | |
| template<class Op > | |
| void | diy::all_to_all (Master &master, const Assigner &assigner, const Op &op, int k=2) |
| all to all reduction More... | |
| template<class Reduce , class Partners , class Skip > | |
| void | diy::reduce (Master &master, const Assigner &assigner, const Partners &partners, const Reduce &reduce, const Skip &skip) |
| Implementation of the reduce communication pattern (includes swap-reduce, merge-reduce, and any other global communication). More... | |
| template<class Reduce , class Partners > | |
| void | diy::reduce (Master &master, const Assigner &assigner, const Partners &partners, const Reduce &reducer) |
| Implementation of the reduce communication pattern (includes swap-reduce, merge-reduce, and any other global communication). More... | |
|
inline |
Post an all-reduce collective using an existing communication proxy. Available operators are: maximum<T>, minimum<T>, std::plus<T>, std::multiplies<T>, std::logical_and<T>, and std::logical_or<T>.
| in | local value being reduced |
| op | operator |
| void diy::all_to_all | ( | Master & | master, |
| const Assigner & | assigner, | ||
| const Op & | op, | ||
| int | k = 2 |
||
| ) |
all to all reduction
| master | block owner |
| assigner | global block locator (maps gid to proc) |
| op | user-defined operation called to enqueue and dequeue items |
| k | reduction fanout |
| void diy::reduce | ( | Master & | master, |
| const Assigner & | assigner, | ||
| const Partners & | partners, | ||
| const Reduce & | reduce, | ||
| const Skip & | skip | ||
| ) |
Implementation of the reduce communication pattern (includes swap-reduce, merge-reduce, and any other global communication).
| master | master object |
| assigner | assigner object |
| partners | partners object |
| reduce | reduction callback function |
| skip | object determining whether a block should be skipped |
| void diy::reduce | ( | Master & | master, |
| const Assigner & | assigner, | ||
| const Partners & | partners, | ||
| const Reduce & | reducer | ||
| ) |
Implementation of the reduce communication pattern (includes swap-reduce, merge-reduce, and any other global communication).
| master | master object |
| assigner | assigner object |
| partners | partners object |
| reducer | reduction callback function |
1.8.6