TBB DeviceAdapterAlgorithms need specializations.

Many of the TBB algorithm implementations either do not scale well or are even outperformed by the serial implementations. The poor performers should be profiled and optimized.

Using https://gitlab.kitware.com/snippets/21 on outputs obtained from running BenchmarkDeviceAdapter_SERIAL and BenchmarkDeviceAdapter_TBB on a quad core + HT processor (ideal speedup = 4) yields the results in https://gitlab.kitware.com/snippets/22.

Edited Sep 08, 2017 by Allison Vacanti