Published December 27, 2023
| Version v4.0.0
Software
Open
A C++17 Thread Pool for High-Performance Scientific Computing
Authors/Creators
Description
v4.0.0 (2023-12-27)
- A major new release with numerous changes, additions, fixes, and improvements. Many frequently requested features have been added, and performance has been optimized. Please note that code written using previous releases will need to be modified to work with the new release. The changes needed to migrate to the new API are explicitly indicated below for your convenience.
- Highlights:
- The light thread pool has been removed. However, by default, the thread pool is in "light mode". Optional features that may affect performance must be enabled by defining suitable macros.
- This library now ships with two stand-alone header files:
BS_thread_pool.hppcontains the mainBS::thread_poolclass and theBS::multi_futurehelper classes, and is the only file needed to use the thread pool itself.BS_thread_pool_utils.hppcontains the additional utility classesBS::signaller,BS::synced_stream, andBS::timer, which are fully independent of the thread pool itself and can be used either with or without it.
- It is now possible to assign priorities to tasks. Tasks with higher priorities will be executed first.
- Member functions for submitting tasks and loops have been renamed for consistency, e.g.
detach_task()andsubmit_task(), where the prefixdetachmeans no future will be returned andsubmitmeans a future orBS::multi_futurewill be returned. - There are now two ways to parallelize loops into blocks:
detach_blocks()andsubmit_blocks()behave the same as loop parallelization in previous releases, running the loop function once per block.detach_loop()andsubmit_loop()have a simpler syntax, where the loop function is run once per index, so the user doesn't have to manually run the internal loop for each block.
- The new member functions
detach_sequence()andsubmit_sequence()allow submitting a sequence of tasks enumerated by indices. - It is now possible to run an initialization function in each thread before it starts to execute any submitted tasks.
- Tasks submitted with
detach_task()orsubmit_task()can no longer have arguments. Task with arguments must be enclosed inside lambda expressions. This simplifies the API and provides better readability. Tasks can still have return values. - Various ways to obtain information about the threads in the pool have been introduced:
- The member function
get_thread_ids()obtains the unique thread identifiers, andget_native_handles()obtains the underlying implementation-defined thread handles. - The new namespace
BS::this_threadallows obtaining the thread's index in the pool usingBS::this_thread::get_index()and a pointer to the pool that owns the thread usingBS::this_thread::get_pool().
- The member function
- Member functions for waiting for tasks have been renamed for brevity:
wait()/wait_for()/wait_until(). In addition, these functions can now optionally throw an exception if the user tries to call them from within a thread of the same pool, which would result in a deadlock. - The first index must now be specified explicitly when parallelizing blocks, loops, and sequences, and it must not be greater than the last index. Also, both indices must now have the same type, or the template parameter should be explicitly specified.
- Optimized the way
detach_blocks(),submit_blocks(),detach_loop(), andsubmit_loop()split the range of the loop into blocks. - Added a utility class
BS::signallerto allow simple signalling between threads. BS::multi_future<T>is now a specialization ofstd::vector<std::future<T>>with additional member functions.
- Breaking changes:
- The light thread pool has been removed. The original idea was that the light thread pool will allow the user to sacrifice functionality for increased performance. However, in my testing I found that there was no actual performance benefit to the light thread pool. Therefore, there is no reason to keep it.
- However, by default, the thread pool is in "light mode". Optional features that may affect performance due to additional checks or more complicated algorithms must be enabled by defining suitable macros before including the library:
BS_THREAD_POOL_ENABLE_PAUSEto enable pausing.BS_THREAD_POOL_ENABLE_PRIORITYto enable task priority.BS_THREAD_POOL_ENABLE_WAIT_DEADLOCK_CHECKto enable wait deadlock checks.
- API migration:
- If you previously used
BS_thread_pool_light.hpp, simply useBS_thread_pool.hppinstead. - If you previously used the pausing feature, define the macro
BS_THREAD_POOL_ENABLE_PAUSEbefore includingBS_thread_pool.hppto enable it.
- If you previously used
- However, by default, the thread pool is in "light mode". Optional features that may affect performance due to additional checks or more complicated algorithms must be enabled by defining suitable macros before including the library:
- Member functions have been renamed for better consistency. Each function has a
detachvariant which does not return a future, and asubmitvariant which does return a future (or aBS::multi_future):detach_task()andsubmit_task()for single tasks.detach_blocks()andsubmit_blocks()for loops to be split into blocks, where the loop function is executed once per block and must have an internal loop, as in previous releases.detach_loop()andsubmit_loop()for loops to be split into blocks, where the loop function is executed once per index and the pool takes care of the internal loop.detach_sequence()andsubmit_sequence()for sequences of enumerated tasks.- API migration: Use the new names of the functions:
push_task()->detach_task()submit()->submit_task()push_loop()->detach_blocks()parallelize_loop()->submit_blocks()
wait_for_tasks(),wait_for_tasks_duration(), andwait_for_tasks_until()have been renamed towait(),wait_for(), andwait_until()respectively.- API migration: Use the new names of the functions:
wait_for_tasks()->wait()wait_for_tasks_duration()->wait_for()wait_for_tasks_until()->wait_until()
- API migration: Use the new names of the functions:
- Functions for parallelizing loops no longer have dedicated overloads for the special case where the first index is 0. These overloads essentially amount to giving the first function argument a default value, which is not allowed in C++, and can be confusing. In addition, indicating the first index explicitly is better for readability.
- API migration: Add the first index 0 manually as the first argument if it was omitted.
- Functions for parallelizing loops no longer allow the last index to be smaller than the first index. Previously, e.g.
detach_blocks(5, 0, ...)was equivalent todetach_blocks(0, 5, ...). However, this led to confusing results. Since the first argument is the first index and the second argument is the index after the last index (i.e. 0 to 5 actually means 0, 1, 2, 3, 4), the user might get the wrong impression thatdetach_blocks(5, 0, ...)will count 5, 4, 3, 2, 1 instead. This option was removed to avoid this confusion.- Sometimes the user might actually want to make a loop that counts down instead of up. This cannot be done by flipping the order of the arguments to e.g.
detach_blocks()(nor could it be done in previous releases). However, it can be done by simply defining a suitable loop function. For example, if you calldetach_blocks(0, 10, loop, 2)and define the loop function asfor (T i = 9 - start; i > 9 - end; --i), then the first block will count 9, 8, 7, 6, 5 and the second block will count 4, 3, 2, 1, 0. detach_loop(),submit_loop(),detach_sequence(), andsubmit_sequence()work the same way. The first index must be smaller than the last index, but you can count down by writing a suitable loop or sequence function.- API migration: Any loop parallelization that used a first index greater than the last index will work exactly the same after switching the first and second arguments so that the smaller index appears first.
- Sometimes the user might actually want to make a loop that counts down instead of up. This cannot be done by flipping the order of the arguments to e.g.
- Functions for parallelizing loops no longer accept first and last indices of different types. The reason for allowing this previously was that otherwise, writing something like
detach_blocks(0, x, ...)wherexis not anintwould result in a compilation error, since0is by default anintand therefore the arguments0andxhave different types. However, this behavior, which usedstd::common_typeto determine the common type of the two indices, sometimes completely messed up the range of the loop. For example, thestd::common_typeofintandunsigned intisunsigned int, which means the loop will only use non-negative indices even if theintstart index was negative, resulting in an integer overflow.- API migration: If you want to invoke e.g.
detach_blocks(0, x, ...)wherexis not anint, you can either:- Make the
0have the desired type using a cast or a suffix. For example, ifxis anunsigned int, write(unsigned int)0or0Uinstead of0. - Specify the template parameter explicitly. For example, if
xis asize_t, writedetach_blocks<size_t>(0, x, ...).
- Make the
- API migration: If you want to invoke e.g.
detach_task()andsubmit_task()no longer accept arguments for the submitted task. Instead, you must enclose the function in a lambda expression. In other words, instead ofdetach_task(task, args...)you should writedetach_task([] { task(args...); }), indicating in the capture list[]whether to capture the task itself, and each of the arguments, by value or reference. Please seeREADME.mdfor examples. This was changed for the following reasons:- Consistency with
detach_blocks()andsubmit_blocks(), as well as the newdetach_loop(),submit_loop(),detach_sequence(), andsubmit_sequence(), which do not accept function arguments either. - In my own multithreaded projects, I find that I almost always need the task to have access to variables in the local scope. This is much simpler, easier, and more concise to do with a lambda capture list, especially an implicit capture
[=]or[&], than by defining a function that takes arguments and then passing these arguments. - Similarly, I find that I mostly submit tasks defined as a lambda on the spot, rather than creating them as separate functions, because it's faster to code and makes it clear exactly what the task does without having to look elsewhere.
- When users post issues to this repository asking for help with their own code that uses the thread pool, the solution often turns out to be "just wrap that in a lambda". Such issues can be avoided if lambdas must be used to begin with.
- Submitting member functions, which previously required the awkward syntax
detach_task(&class::function, &object, args...), can now be achieved with the much simpler and more readable syntaxdetach_task([] { object.function(args...); })with the appropriate captures. - Passing arguments by reference, which previously required using
std::ref, e.g.detach_task(task, std::ref(arg)), can now be achieved with the much simpler and more readable syntaxdetach_task([&arg] { task(arg); }). - The new syntax allows specifying the priority of the task easily, as the second argument - otherwise, it would have been hard to distinguish the priority from a task argument, making the API more complicated and confusing. This syntax will also permit adding additional arguments to the member functions as needed in the future.
- API migration: Enclose all tasks with arguments inside a lambda expression. All submitted tasks must have no arguments, but they can still have return values.
- Alternatively,
std::bindcan also be used, if the old syntax is preferred to a lambda. Just wrap it around the task and its arguments: instead ofdetach_task(task, args...), writedetach_task(std::bind(task, args...)). This achieves the same effect, and can be used to easily convert v3.x.x code to v4.0.0 using a simple regular expression search and replace:push_task\((.*?)\)->detach_task(std::bind($1))submit\((.*?)\)->submit_task(std::bind($1))
- Alternatively,
- Consistency with
BS::synced_streamandBS::timerhave been moved toBS_thread_pool_utils.hpp.- API migration: Include the new header file if either of these utility classes are used.
- The light thread pool has been removed. The original idea was that the light thread pool will allow the user to sacrifice functionality for increased performance. However, in my testing I found that there was no actual performance benefit to the light thread pool. Therefore, there is no reason to keep it.
BS_thread_pool.hppnew features:- A new optional feature, enabled by defining the macro
BS_THREAD_POOL_ENABLE_PRIORITY, allows assigning priority to tasks. The priority is a number of typeBS::priority_t, which is a signed 16-bit integer, so it can have any value between -32,768 and 32,767. The tasks will be executed in priority order from highest to lowest.- To assign a priority to a task, add the priority as the last argument to any of the
detachorsubmitfunctions. If the priority is not specified, the default value will be 0. - The namespace
BS::prcontains some pre-defined priorities for users who wish to avoid magic numbers and enjoy better future-proofing. In order of decreasing priority, the pre-defined priorities are:BS::pr::highest,BS::pr::high,BS::pr::normal,BS::pr::low, andBS::pr::lowest. - Please see
README.mdfor more information, including performance considerations.
- To assign a priority to a task, add the priority as the last argument to any of the
- The new member functions
detach_loop()andsubmit_loop()facilitate loop parallelization without having to worry about internal loops in the loop function. In previous releases, the loop function had to be of the form[](T start, T end) { for (T i = start; i < end; ++i) loop(i); }. This behavior has been preserved indetach_blocks()andsubmit_blocks(). However, the newdetach_loop()andsubmit_loop()allow much simpler loop functions of the form[](T i) { loop(i) }, greatly simplifying the interface.- Performance-wise, due to fewer function calls,
detach_blocks()andsubmit_blocks()are generally faster. However, the difference is usually not significant, and with compiler optimizations there may be no difference at all. In any case,detach_loop()andsubmit_loop()are provided as convenience functions, but performance-critical applications can stick withdetach_blocks()andsubmit_blocks().
- Performance-wise, due to fewer function calls,
- The new member functions
detach_sequence()andsubmit_sequence()facilitate submitting a sequence of tasks enumerated by indices. This is a bit similar todetach_loop()andsubmit_loop(), except that the range of indices is not split into blocks with each block containing a smaller range of indices. Instead, there is exactly one task per index. This can be used, for example, to submit a sequence of tasks with each one independently processing a single array element.detach_sequence()does not return a future, whilesubmit_sequence()returns aBS::multi_future. - It is now possible to run an initialization function in each thread before it starts to execute any submitted tasks. The function must take no arguments and have no return value. It will only be executed exactly once, when the thread is first constructed. It can be passed as an argument to the constructor or to
reset(). See #104, #105, #113, and #119. - Added a member function
get_thread_ids()which returns a vector containing the unique identifiers for each of the pool's threads, as obtained bystd::thread::get_id(). See #126. - A new optional feature, enabled by defining the macro
BS_THREAD_POOL_ENABLE_NATIVE_HANDLES, adds a member functionget_native_handles()which returns a vector containing the underlying implementation-defined thread handles for each of the pool's threads. These can then be used in an implementation-specific way to manage the threads at the OS level; however, note that this will generally not be portable code. See #122.- This feature is disabled by default since it uses std::thread::native_handle(), which is in the C++ standard library, but is not guaranteed to be present on all systems.
- A new namespace
BS::this_threadwas created to provide functionality similar tostd::this_thread.BS::this_thread::get_index()can be used to get the index of the current thread. If this thread belongs to aBS::thread_poolobject, it will have an index from 0 toBS::thread_pool::get_thread_count() - 1. Otherwise, for example if this thread is the main thread or an independentstd::thread,std::nulloptwill be returned.BS::this_thread::get_pool()can be used to get the pointer to the thread pool that owns the current thread. If this thread belongs to aBS::thread_poolobject, a pointer to that object will be returned. Otherwise,std::nulloptwill be returned.- Note that both functions return an
std::optionalobject.
BS::multi_future<T>is now defined as a specialization ofstd::vector<std::future<T>>. This means that all of the member functions that can be used on anstd::vectorcan also be used on aBS::multi_future. For example, it is now possible to use a range-basedforloop with aBS::multi_futureobject, since it has iterators.- In addition to inherited member functions,
BS::multi_futurehas the following specialized member functions, most of which are new in this release:get(),ready_count(),valid(),wait(),wait_for(), andwait_until(). Please seeREADME.mdfor more information. See also #128.
- In addition to inherited member functions,
- A new optional feature, enabled by defining the macro
BS_THREAD_POOL_ENABLE_WAIT_DEADLOCK_CHECK, allowswait(),wait_for(), andwait_until()to check whether the user tried to call them from within a thread of the same pool, which would result in a deadlock. If so, they will throw the exceptionBS::thread_pool::wait_deadlockinstead of waiting.
- A new optional feature, enabled by defining the macro
BS_thread_pool_utils.hpp:- The utility classes
BS::synced_streamandBS::timernow reside in this header file instead of the main one. BS::timerhas a new member function,current_ms(), which can be used to obtain the number of milliseconds that have elapsed so far, but keep the timer ticking.- The new utility class
BS::signallerallows simple signalling between threads. It can be used to make one or more threads wait, using thewait()member function. When another thread uses theready()member function, all waiting threads stop waiting. This class is really just a convenient wrapper aroundstd::promise, which contains both the promise and its future.
- The utility classes
BS_thread_pool.hppbug fixes and minor changes:- Optimized locking in the worker function. This should result in increased performance.
- Optimized the way
detach_blocks(),submit_blocks(),detach_loop(), andsubmit_loop()split the range of the loop into blocks. All blocks are now guaranteed to have one of two sizes, differing by 1, with the larger blocks always first. See #96.- For example, in previous releases, 100 indices were split into 15 blocks as 14 blocks of size 6 and one additional block of size 16, which was suboptimal. Now they are split into 10 blocks of size 7 and 5 blocks of size 6, which means the tasks are as evenly distributed as possible.
- Fixed a bug that caused paused pools to have high idle CPU usage if pausing was used. See #120.
- The worker now destructs the task object as soon as it finishes executing. See #124 and #129.
- Added Markdown inline code formatting in all comments whenever applicable, which makes the comments look nicer when displayed as a tooltip in Visual Studio Code or other supporting IDEs.
- The
BS::thread_pool::blockshelper class has been moved into the main thread pool class, and now returns a degenerate object (zero blocks) ifindex_after_last <= first_index.
BS_thread_pool_test.cpp:- Removed tests for the light thread pool.
- Added/modified tests for all new/changed features.
- Many of the previous tests have been simplified and optimized.
- The program now takes command line arguments:
help: Show a help message and exit.log: Create a log file.tests: Perform standard tests.deadlockPerform long deadlock tests.benchmarks: Perform benchmarks.- If no options are entered, the default is:
log tests benchmarks.
- By default, the test program enables all the optional features by defining the suitable macros, so it can test them. However, if the macro
BS_THREAD_POOL_LIGHT_TESTis defined during compilation, the optional features will not be tested. - Instead of using a pre-defined list to specify the number of loop blocks to try in the benchmarks, the program now simply keeps increasing the number of blocks until it finds the optimal value. Often, the optimal number of blocks is much higher than the number of hardware threads, but if the number is too high it will result in diminishing returns.
check_loop_no_return()now checks that the loop modifies all the indices exactly once, to detect cases where an index has been modified more than once, e.g. if the same loop index was erroneously placed in more than one block.- Instead of defining
_CRT_SECURE_NO_WARNINGS, the program now useslocaltime_sinstead ofstd::localtimeif MSVC is detected to avoid generating a warning. - On macOS, the test program will exit with
std::terminate()instead ofstd::quick_exit()if any tests failed. This is because macOS does not implementstd::quick_exit()for some reason. Note that as a result, the number of failed tests cannot be returned by the program on macOS. Unfortunately,std::exit()cannot be used here, as it might get stuck if a deadlock occurs. See #106 - The log file now uses the name of the executable file, followed by the date and time, so it's easy to distinguish between log files generated by different builds of the test (since the test script names them based on the compiler used). Also, the program now checks if the log file failed to open for some reason, and writes only to the standard output in that case.
- The benchmarks now display a progress bar.
- The test program will now detect the OS and compiler used.
BS_thread_pool_test.ps1:- The script will compile and run a light version of the test, with no optional features enabled, in addition to the main test, for each compiler.
- The source and build folders will now be determined relative to the script folder, to ensure that the script works no matter which folder it is executed from.
- The script now checks that the include files
BS_thread_pool.hppandBS_thread_pool_utils.hppare present before attempting to compile the test program.
README.md:- Added/modified documentation for all new/changed features.
- Revised many of the existing examples and explanations.
- Added a complete library reference at the end of the documentation.
- Added instructions for installing the package using Meson and CMake with CPM. The installation instructions with various package managers and build systems were moved to the end, before the reference.
- Miscellaneous:
- A
.clang-tidyfile is now included, with all the checks that are enabled in this project. The pull request template has been updated to suggest that authors lint their code using this file before submitting the pull request.
- A
- This release is dedicated to my wife (since December 1, 2023), Pauline. Her endless love, support, and encouragement have been a great source of motivation for working on this and other projects. I am so lucky and honored to
my_future.share()with her ❤️
Notes
Files
bshoshany/thread-pool-v4.0.0.zip
Files
(88.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:339085bafd8993b1e5f73241aee9f0b3
|
88.9 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/bshoshany/thread-pool/tree/v4.0.0 (URL)