TBB DeviceAdapterAlgorithms need specializations.
Many of the TBB algorithm implementations either do not scale well or are even outperformed by the serial implementations. The poor performers should be profiled and optimized.
Using https://gitlab.kitware.com/snippets/21 on outputs obtained from running `BenchmarkDeviceAdapter_SERIAL` and `BenchmarkDeviceAdapter_TBB` on a quad core + HT processor (ideal speedup = 4) yields the results in https://gitlab.kitware.com/snippets/22.
issue