Skip to content
Snippets Groups Projects
Commit 250888f7 authored by Thomas Otahal's avatar Thomas Otahal
Browse files

CPU parallel radix sorting

Parallel radix sorting will be invoked in DeviceAdapterAlgorthmTBB.h when
the input is ArrayHandle<T, vtkm::cont::StorageTagBasic> where T is one of
the following basic C++ types:

unsigned int
unsigned short int
unsigned long int
unsigned long long int
unsigned char
char16_t
char32_t
wchar_t
char
short
int
long long
signed char
float
double

If a comparison operator is provided, it must be type std::less<T> or std::greater<T>.

Radix sort implementation is Satish parallel radix sort as documented in the
following citation:

  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.
    N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey.
    In Proc. SIGMOD, pages 351–362, 2010

Implementation is based on Takuya Akiba's GitHub source code with the following
changes:

   - Changed parallel threading from OpenMP to TBB tasks
   - Removed pair sorting
   - Added minimum threshold for parallel, will instead invoke serial radix sort (kxsort)
   - Added std::greater<T> and std::less<T> to interface for descending order sorts
   - Added can_use_parallel_radix_sort<T, F>() function to determine if parallel radix sorting
     is possible for type T and compare function F (fallback is std::sort() if not possible)
   - Added linear scaling of threads used by the algorithm for more stable performance
     on machines with lots of available threads (KNL and Haswell)

Added kxsort (serial MSD radix sort by Dinghua Li via GitHub) implementation without modification.
parent 8be1a71a
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment