A C++17 Thread Pool for High-Performance Scientific Computing

doi:10.5281/zenodo.6931574

Published May 3, 2021 | Version v3.2.0

Software Open

A C++17 Thread Pool for High-Performance Scientific Computing

Shoshany, Barak

BS_thread_pool.hpp:
- Main BS::thread_pool class:
  - Added a new member function, push_loop(), which does the same thing as parallelize_loop(), except that it does not return a BS::multi_future with the futures for each block. Just like push_task() vs. submit(), this avoids the overhead of creating the futures, but the user must use wait_for_tasks() or some other method to ensure that the loop finishes executing, otherwise bad things will happen.
  - push_task() and submit() now utilize perfect forwarding in order to support more types of tasks - in particular member functions, which in previous versions could not be submitted unless wrapped in a lambda. To submit a member function, use the syntax submit(&class::function, &object, args). More information can be found in README.md. See #9.
  - push_loop() and parallelize_loop() now have overloads where the first argument (the first index in the loop) is omitted, in which case it is assumed to be 0. This is for convenience, as the case where the first index is 0 is very common.
- Helper classes:
  - BS::synced_stream now utilizes perfect forwarding in the member functions print() and println().
  - Previously, it was impossible to pass the flushing manipulators std::endl and std::flush to print() and println(), since the compiler could not figure out which template specializations to use. The new objects BS::endl and BS::flush are explicit casts of these manipulators, whose sole purpose is to enable passing them to print() and println().
  - BS::multi_future::get() now rethrows exceptions generated by the futures, even if the futures return void. See #62.
  - Added a new helper class, BS::blocks, which is used by parallelize_loop() and push_loop() to divide a range into blocks. This class is not documented in README.md, as it most likely will not be of interest to most users, but it is still publicly available, in case you want to parallelize something manually but still benefit from the built-in algorithm for splitting a range into blocks.
BS_thread_pool_test.cpp:
- Added plenty of new tests for the new features described above.
- Fixed a bug in count_unique_threads() that caused it to get stuck on certain systems.
- dual_println() now also flushes the stream using BS::endl, so that if the test gets stuck, the log file will still contain everything up to that point. (Note: It is a common misconception that std::endl and '\n' are interchangeable. std::endl not only prints a newline character, it also flushes the stream, which is not always desirable, as it may reduce performance.)
- The performance test has been modified as follows:
  - Instead of generating random vectors using std::mersenne_twister_engine, which proved to be inconsistent across different compilers and systems, the test now generates each element via an arbitrarily-chosen numerical operation. In my testing, this provided much more consistent results.
  - Instead of using a hard-coded vector size, a suitable vector size is now determined dynamically at runtime.
  - Instead of using parallelize_loop(), the test now uses the new push_loop() function to squeeze out a bit more performance.
  - Instead of setting the test parameters to achieve a fixed single-threaded mean execution time of 300 ms, the test now aims to achieve a fixed multi-threaded mean execution time of 50 ms when the number of blocks is equal to the number of threads. This allows for more reliable results on very fast CPUs with a very large number of threads, where the mean execution time when using all the threads could previously be below a statistically significant value.
  - The number of vectors is now restricted to be a multiple of the number of threads, so that the blocks are always all of the same size.
README.md:
- Added instructions and examples for the new features described above.
- Rewrote the documentation for parallelize_loop() to make it clearer.

Notes

If you use this package in published research, please cite it as follows.

Files

bshoshany/thread-pool-v3.2.0.zip

Files (48.5 kB)

Name	Size	Download all
bshoshany/thread-pool-v3.2.0.zip md5:eaa30920f93f902561c03ba2d7684589	48.5 kB	Preview Download

Additional details

Is supplement to: https://github.com/bshoshany/thread-pool/tree/v3.2.0 (URL)

	All versions	This version
Views	1,308	54
Downloads	54	1
Data volume	2.8 MB	48.5 kB

A C++17 Thread Pool for High-Performance Scientific Computing

Creators

Description

Notes

Files

bshoshany/thread-pool-v3.2.0.zip

Files (48.5 kB)

Additional details

Related works