TBB DeviceAdapterAlgorithms need specializations.
Many of the TBB algorithm implementations either do not scale well or are even outperformed by the serial implementations. The poor performers should be profiled and optimized.
Using https://gitlab.kitware.com/snippets/21 on outputs obtained from running BenchmarkDeviceAdapter_SERIAL
and BenchmarkDeviceAdapter_TBB
on a quad core + HT processor (ideal speedup = 4) yields the results in https://gitlab.kitware.com/snippets/22.
Edited by Allison Vacanti