Multithreader creates many unnecessary threads
Multithreader always creates as many threads as the number of CPU cores (unless it is explicitly restricted). No matter what piece size is chosen, even if only one thread is used for computation, always the maximum number of threads are created&destroyed, which may be very significant overhead.
For example, on a desktop system with NVidia GPU and Intel i& CPU, 90fps refresh rate can be achieved when maximum number of threads is set globally to 1. However, with default settings, the multithreader would need create&destroy almost 4000 threads per second to reach the same refresh rate (for 8 core: 7 new threads per filter x 6 filters x 90 fps), which does not happen, but instead the frame rate with default setting drops to about 10fps.
Proposed solution:
- fix multithreader logic to only create threads that are actually used
- allow global scaling of piece size (so that a balance could be found based on cost of thread creation and processing speed)
- set sensible piece size for certain filters (e.g., those filters that perform trivial operations should probably have larger piece size, since computation time is comparable to thread creation time)
In our application (3D Slicer) we decided to abandon the default multithreader and switch to TBB (https://github.com/Slicer/Slicer/pull/930), but for other users, it would be still useful to implement the above proposed steps.