Enabling semantics to improve detection of data races and misuses of lock-free data structures

Summary The rapid progress of multi/many-core architectures has caused data-intensive parallel applica-tionsnotyetfullyoptimizedtodeliverthebestperformance.Intheadventofconcurrentprogram-ming,frameworksofferingstructuredpatternshavealleviateddevelopers'burdenadaptingsuch applicationstomultithreadedarchitectures.Whilesomeofthesepatternsareimplementedusing synchronization primitives, others avoid them by means of lock-free data mechanisms. However, lock-freeprogrammingisnotstraightforward,ensuringanappropriateuseoftheirinterfacescan bechallenging,sincedifferentmemorymodelsplusinstructionreorderingatcompiler/processor levelscaninterfereintheoccurrenceofdataraces.Thebenefitsofracedetectorsareformidable in this sense; however, they may emit false positives if are unaware of the underlying lock-free structure semantics. To mitigate this issue, this paper extends ThreadSanitizer, a race detection tool, with the semantics of 2 lock-free data structures: the single-producer/single-consumer and themultiple-producer/multiple-consumerqueues.Withit,weareabletodropfalsepositivesand detect potential semantic violations. The experimental evaluation, using different queue implementations on a set of 𝜇 benchmarks and real applications, demonstrates that it is possible to reduce, on average, 60% the number of data race warnings and detect wrong uses of these structures.


INTRODUCTION
As we pave the way towards exascale computing, the use of multi and many-core architectures to efficiently solve scientific problems becomes a complex challenge that the High Performance Computing (HPC) community needs to face. 1,2 The adoption of parallel programming frameworks executing multiple processes and/or threads simultaneously, drops developer's burden to design and implement efficient parallel applications from scratch. Despite this, much of the current software is not yet fully accommodated to run on recent parallel platforms. In most cases, hardware design progresses faster than the parallelization and optimization of existing software. So as to deal with this issue, the use of building blocks implementing core functionalities has been a widely accepted approach in the HPC area. 3 Indeed, many of scientific parallel applications leverage efficient parallel kernels from highly tuned libraries at the bottom of their food chain. However, these kernels must guarantee correctness and thread safety to generate correct global results.
While parallel programming techniques have been broadly adopted, concurrency bugs, especially data races, have become more frequent. current race detectors are not able to detect wrong uses of lock-free data structures, thus violating their semantics and possibly generating undefined results.
Given the foregoing, we benefit from semantics to improve the detection of data races and misuses of some lock-free data structures.
Specifically, we contribute in this paper with the following: • We formalize the semantics of the single-producer/single-consumer (SPSC) and multiple-producer/multiple-consumer (MPMC) lockfree queues.
• We describe the 2 main extensions to improve detection of data races: (1) to drop false positive data races and (2) to detect misuses even when data races are not encountered.
• We explain in detail how the aforementioned extensions have been implemented into ThreadSanitizer (TSan), a well-known race detector part of the low-level virtual machine (LLVM) infrastructure.
• We validate our extension for filtering false positive data races using bounded and unbounded SPSC queues along with different benchmarks and applications from the FastFlow 6 framework. For MPMC queues, we leverage 4 well-known implementations.
• We analyze our extension for detecting misuses through a fault injection mechanism over a set of synthetic benchmarks using SPSC queues.
While some of these results were already presented in Dolz et al,7 the support for MPMC queues and the improved integration of the semantic extensions into TSan plus the detection of misuses are new contributions of this paper.
This paper is organized as follows: Section 2 revisits some related work and highlights the differences regarding this paper's contribution. Section 3 describes the main software pieces used in our research.
Section 4 describes the semantics of the SPSC and MPMC lock-free queues. Section 5 motivates the use of semantics to drop false positives and detect misuses, also when using more restrictive semantics. Section 6 details how our techniques for dropping false positives and detecting misuses have been implemented into the race detection tool. Section 7 evaluates such mechanisms through a series of experiments with benchmarks and real applications that leverage SPSC and MPMC queues. Finally, Section 8 closes the paper with some concluding remarks and future works.

RELATED WORK
Over the years, numerous solutions to detect data races have been proposed. 8 These have been basically based on different well-known mechanisms: (1) happens-before relations, (2) locksets, and (3) hybrid approaches, ie, combining both happens-before and locksets mechanisms.
Basically, the happens-before relations 9 are used to detect if 2 conflicting memory accesses are not ordered by synchronization operations, causing a potential data race. The Intel Inspector 10 and Acculock 11 are, respectively, a well-known commercial tool and a recent research tool implementing this algorithm. Nevertheless, software implementations of happens-before-based detectors typically suffer from large runtime overheads, so hardware-based solutions have also been proposed to overcome these issues. 12 On the other hand, the locksets approach reports a data race if there is no common lock held by 2 threads accessing the same memory location. This approach can be found in both static 13 and dynamic tools 14 in the state-of-the-art.
Finally, hybrid approaches take advantage of happens-before mechanisms to reduce the false positives reported by lockset-based race detectors and preserve the performance advantages of the lockset mechanisms. 15 A race detector implementing this approach is TSan. 16 Although previously mentioned tools aid developers to find concurrency bugs, these can still miss ad hoc synchronizations and therefore generate benign race reports. So as to face these issues, 2 main approaches for improving the accuracy of data race detection have been adopted: (1) filtering out benign races and stick only with the harmful ones and (2)

BACKGROUND
In this section, we give an overview of the 2 main software components that have been used to perform the contributions made in this paper. First, we review some basic concepts about lock-free structures and introduce the 2 main lock-free buffers used in this paper: SPSC and MPMC queues. Next, we revisit the LLVM infrastructure along with TSan data race detector to identify undefined and suspicious behavior of threads.

The LLVM infrastructure and the TSan data race detection tool
The LLVM is a compiler infrastructure designed to be a set of reusable libraries with well-designed interfaces. 31 The LLVM generates intermediate code, that is, afterwards converted into a machine-dependent assembly code for a specific target platform. Thanks to its high-level API, LLVM provides the ability to develop and integrate new modules to *In FastFlow, an unbounded queue is implemented using a pool of bounded SPSC queues that grows or shrinks on demand. perform compile-time analysis and instrumentation. Taking advantage of the latter feature, several runtime checks and tools have been developed to identify suspicious and undefined behavior of threads. One of them is TSan, a data race detector for applications written in C/C++ or Go that uses compile-time instrumentation to check for non-race-free memory accesses at runtime. 19 Specifically, TSan instrumentation tracks synchronization primitives, thread routines from libpthread, memory allocation routines, dynamic annotations, and other kinds of functions that lead to synchronizations. Its runtime library provides entry points for the instrumented code to keeps all the information that is of interest for the race detector. With all these data, 2 race detection mechanisms based on happens-before and locksets relations are applied. As a summary of O'Callahan and Choi, 15 these mechanisms develop the following strategies: Happens-before relations detect a potential data race when 2 events a and b access a shared memory location, where at least one of these accesses is a write, and neither a happens-before b nor b happens-before a. They are concurrent, so no causal relationship ordering exists between a and b. 9 Locksets determine a data race when none of the locks held by a pair of events accessing to a shared memory location, where at least one of these accesses is a write, is the same, ie, when the intersection of their locksets is empty.
Contrary to other race detectors, the TSan detector can be switched to work only with the happens-before mechanism, also known as pure happens before, or with a combination of both previous mechanisms, referred as the hybrid mode. 19 While in the first mode the concurrency is only checked in happens-before relations, in the hybrid mode, both happens-before and locksets mechanisms are used together to determine if 2 events are concurrent. The reason of having an hybrid mode is that maintaining vector clocks for every shared memory location and every lock, as it is the case for the pure happens-before mode, are too expensive in the practice. Also because the pure happens-before is less predictable and can miss data races, as too many bogus interthread messages are generated. The hybrid mode avoids this shortcomings by means of using happens-before relations in memory accesses and locksets in locking primitives.
In summary, the main reasons for having selected TSan as for the data race detector and improve it with lock-free data structure semantics are (1) it is the only tool that provides the most

SEMANTICS OF LOCK-FREE QUEUES
In this section, we describe formally the bounded and unbounded SPSC and the MPMC queues along with their semantics for the concurrent lock-free versions. These definitions allows us to proceed further with our rationale for developing rules that guarantee the proper use, among entities, of these lock-free parallel structures.

Formal definition
Consider a queue  the tuple where buf, pread plus pwrite are internal read and write pointers for the buffer, respectively, and M is a set comprising the following methods: • init: Initializes the buffer buf, allocating space of possibly aligned memory and reseting the internal (pread and pwrite) pointers by placing them at the beginning of buf. If buf has already been allocated, this method does nothing.
• push: Enqueues the item into the buffer buf.
• pop: Removes and returns the first item in the buffer buf.
• empty: Returns true if the buffer buf is empty.
• register: Registers a producer or a consumer for allocating internal variables. Note that this function is tied to the specific MPMC queue interface presented on this paper and should be called before the producers and consumers start pushing and popping data, respectively.
Note that, depending on the internal implementation of a particular queue, buf can be expressed in different ways. For example, in a SPSC bounded queue, buf can be declared as a circular buffer, while for a MPMC it can be an array or a list of pointers. Figure 1A Figure 1B shows an schema of a MPMC queue used by n and m producers and consumers.

Semantics of the concurrent lock-free SPSC queue
The correctness of parallel lock-free SPSC queues, such as the Lamport 28 or FastForward implementation, 27 is only ensured if several usage requirements are met. We define these requirements as the following semantics rules: 1. Roles: A lock-free concurrent SPSC queue instance can be shared by multiple entities acting as initializers, producers, and consumers.
Note that a certain entity can perform any role, however, at any point in time, there must only exist a producer and a consumer performing operations on the same queue concurrently. Furthermore, an initializer cannot operate over the queue concurrently with any other entity. In any other case, we consider that the queue is misused thus having an undefined behavior due to the occurrence of potential data races. Particularly, all subsets allotted to different roles of the queue fulfill M = Init∪Prod∪Cons. Note also that methods internally using the pwrite pointer are those assigned to the producer, while those using the pread pointer are related to the consumer.

Initialization methods: The initializers can call to methods belonging to
To formalize the aforementioned semantics, we first make the following definitions. First, we define an event as invocation of a method at a certain point of time performed by an entity. In our particular case, we distinguish among 3 different event types: production, consumption, and initialization and denote them as p, c, and i, respectively. Second, we define E as a set of events that is related to each of the preceding methods subsets in the queue , which stores all past methods invocations.
With these definitions, it is possible to control the proper use of the lock-free SPSC queue by checking 3 simple requirements depending on the type of a new incoming event. These requirements, defined in Equations 1, 2, and 3, are checked each time a new production (p ′ ), consumption (c ′ ), or initialization (i ′ ) event occurs, respectively. Assuming that there has been at least an initialization event, Equation 1 ensures that the new production event has a happens-before relation (→) with all past initialization events and is not concurrent (≉) with all past production events. Similarly, Equation 2 performs the same verification but for incoming consumption events.
Additionally, Equation 3 ensures that all initialization events happened sequentially with any other event. That is, all past events of  have FIGURE 1 Schemas of the single-producer/single-consumer and multiple-producer/multiple-consumer queues a happened-before relation with the new initialization event i ′ . If this requirement is not met at some point, it might be that the queue has not been properly initialized, and therefore, it can lead to undefined behaviors. Table 1 illustrates a correct execution sequence using a lock-free SPSC queue. This table is organized as follows: (1)

Semantics of the concurrent lock-free MPMC queue
The requirements that guarantee the correctness of a parallel lock-free MPMC queues are slightly different to those for declared for the SPSC queues. These requirements are defined as the following semantics rules: 1. Roles: A lock-free concurrent MPMC queue instance can be shared among different entities that can act interchangeably as producers and consumers. However, they should have been registered themselves in the queue before producing and consuming on it. Also, an initializer cannot operate over the queue concurrently with any other entity, while entities registrations should happen after the last initialization. In any other case, we consider that the queue is misused thus having an undefined behavior due to the occurrence of potential data races.

Initialization methods: The initializers can call to methods belonging to
The dash "-" indicates that the semantic rule does not proceed for the case.
The dash "-" indicates that the semantic rule does not proceed for the case. 5. Registration methods: The registration methods should be invoked before a producer or a consumer starts pushing or popping data from the queue, respectively: Specifically, all subsets allotted to the different queue roles fulfill In the same way that for SPSC queues, we formalize the aforementioned semantics making the following assumptions. For the MPMC queues, we add a new event type on the top of the previously defined SPSC queue-related events. This event type corresponds to registration operations and is denoted as r.
Furthermore, we specialize consumption, production, and registration events with the caller thread ID, ie, given an arbitrary thread T generating an event e, this event is represented as e T . Next, we assume that each of the preceding methods subsets includes the set E as an attribute including all past events related to their methods invocations.
In this case, the correct use of the lock-free MPMC queues can be controlled with the requirements Equations 4 to 7. These requirements are checked each time a new production (p ′ ), consumption (c ′ ), registration (r ′ ), or initialization (i ′ ) event occurs. Regarding Equation 4, a new production event p ′ and performed by the thread with ID T (ie, p ′ T ), should happen after the registration event r T and the last initialization event i, considering that i happened-before r T . Correspondingly, Equation 5 achieves the same goal but for incoming consumption events.
Likewise, requirement 6 ensures that the queue  has been initialized before a new registration event r ′ happens. Finally, Equation 7 guarantees that there are not concurrent events for a new initialization event i ′ . If any of these requirements is not met, the correct use of a MPMC queue  cannot be assured because of the occurrence of potential data races.

ENABLING SEMANTICS TO IMPROVE DETECTION OF DATA RACES
In this section, we motivate the use of semantics to improve the detection of data races related to lock-free data structures within TSan.
Specifically, we find out that TSan has some shortcomings when dealing with this kind of data types, since it is not capable of determining whether a data race is harmful or not nor determining if there have been misuses in a given shared lock-free data structure. In the following Sections 5.1 and 5.2, we expose these issues and our contributions to address them.

Dropping false positives
Our first observation is that the implementation of TSan is completely semantic agnostic regarding lock-free data structures. The TSan detects all possible race conditions, regardless of whether they deal with lock-free structures or not. To illustrate this shortcoming, we leverage the execution sequence of Table 3, where a SPSC queue  is being The execution flow of sequence in Table 3  In summary, not all race conditions detected by TSan are harmful when dealing with lock-free data structures. Current existing approaches within TSan to filter false positives are blacklists, used to

Detecting misuses
Our second observation is that TSan only warns about potential data races; however, there might exist other semantic misuses in specific lock-free data structures that are not related at all to concurrent memory accesses. Taking into account that the hybrid mode of TSan determines a situation of data race when both conditions happens-before and locksets are not met, part of the conditions defined in our semantics (see the requirements in Section 4.2) only need to check for happens-before relations. Therefore, even if TSan does not detect any data race between a pair of events accessing a shared memory location, our semantics can detect misuses in SPSC/MPMC lock-free queues. As an example, Table 4 shows a situation where two threads call concurrently to the init and push functions, respectively, both wrapped into a critical section using the same lock. In this case, the TSan hybrid detector does not report a data race, as the locksetsintersection is not empty, but our semantic requirements determine an misuse of the SPSC queue, as there is not a happens before between the push and init events. This is given because the thread with ID 2 can acquire the lock before thus calling to push on an uninitialized queue and leading to an undefined behavior. Therefore, our contribution allows users to detect misuses of these kind of lock-free data structures. Abbreviation: SPSC, single-producer/single-consumer.

Extending detection of misuses for FastFlow queues
While the previous 2 contributions aim at dropping false positives and detecting misuses in generic SPSC/MPMC queues using the semantics defined in Section 4, there exist other lock-free data structures that have, by definition, more restrictive semantics. To illustrate this situation, we focus on the FastFlow queues, as these structures have specific constraints that limit their use in a particular way. Specifically, the Fast- By inserting the ID of the caller thread to the corresponding set C of the subsets each time a method belonging to it is invoked, it is possible to control the proper use of this kind of lock-free SPSC queue. This can be achieved by checking 3 new requirements. The first ensures that the cardinality of the C set of the initialization, producer, and consumer subsets should always be less or equal than 1, hence, only one and the same entity should use methods allotted to its role. The second guarantees that both producer and consumer are performing the right roles, ie, The third requirement guarantees that the queue has been initialized before being used by the producer and the consumer threads: if Init.C = ∅, then Prod.C ∪ Cons.C = ∅.
As an example, Table 5 depicts an execution sequence a SPSC queue  concurrently used by a constructor, producer and consumer entities.
The arrangement of the table follows the same structure as Table 3; † In certain cases, the producer or the consumer can perform the role of the initializer, being only 2 different entities sharing the same queue. however, in this case, we include the column misuse that displays whether there has been a violation of the previous semantics in t i or not. The execution is as follows. After the initialization of  in t 0 , the thread with ID 1 invokes push and creates a thread with ID 2. Then, this new thread mistakenly calls to push, a function not allotted to its role and thus violating Equation 9. Nevertheless, given that the consumer thread has been created right after the producer push call, TSan does not detect any data race in t 3 .

IMPLEMENTATION
In this section, we describe implementation details considered to integrate semantics of the bounded and unbounded SPSC and MPMC queues into the data race detector TSan. Particularly, we subdivide this section to explain the implementation details and required modifications in TSan runtime internals to perform the objectives introduced in Section 5.

Distinguishing between multiple queue instances
The first step to embed semantics into TSan is to distinguish between multiple queue instances given that a multithreaded application can use multiple lock-free SPSC/MPMC queues simultaneously. Therefore, it is necessary to univocally identify those in reports generated by TSan. To solve this issue, we have implemented a mechanism within TSan internals that is able to retrieve the C++ implicit this pointer associated with the queue instance involved in a data race. Although there exist several possibilities to implement this approach using communication channels (shared memory or sockets), we have followed a different path that does not require modifications in the user code. Our approach leverages the debugging information, stored in the binary files, to know where exactly the this pointer is placed. It consists of the following steps: 1. First, we leverage the libunwind library 32 to walk backwards the stack until the frame related to the function causing the data race is retrieved. This search is performed in a loop fashion via libunwind function calls, checking each time if the frame retrieved belongs to a member function of the data structure under study. Once this frame is encountered, we keep the stack context and go to the next step.
Note that if there are inlined functions within the calling stack, it is not possible to retrieve the desired function frame by unwinding the stack. Thus, our approach requires the -O0 flag to suppress automatic inlining at compile time.
2. Next, we use the libdwarf library 33 to navigate up the DWARF 34 hierarchy tree and find out the location of the this pointer for that specific function. To minimize overheads, we use an internal data structure that stores locations of this pointers for the different functions of the queue. Thus, this query is only run once per function. Note that for obtaining this information is strictly necessary to compile with -g to generate source-level debug information, since libdwarf accesses the .debug_info section from ELF files.
To illustrate how libdwarf is used, we use the example of DWARF tree associated to the FastFlow SPSC queue class implementation (uSWSR_Ptr_Buffer) in Listing 2. We next describe the procedure for recovering the this pointer location of the push function. First, we walk the DWARF tree until the subprogram (DW_TAG_subprogram Debugging Information Entry) that matches the class linkage name and function name push is found (line 1). Afterwards, using the class subprogram ID, we seek the push subprogram whose specification matches that ID (line 16). From this point on, we walk the parameters of the push subprogram to look for the this parameter and get its location (line 23). In this case, the DW_AT_location equals to DW_OP_fbreg -16, which means that the pointer is stored 16 bytes ahead the stack frame pointer. However, depending on compiler optimizations and calling conventions (dictated by ABI), this pointer can also be placed in other locations. For example, the value DW_OP_reg14 would indicate its presence in register r14.

Implementing semantics
The second step is to implement the 2 levels of semantic verification To support this functionality, we use several data structures to store this pointers from queue objects along with their methods subsets to collect events performed by the threads. So as to apply the semantic requirements, it is necessary to capture the events each time a thread calls to one of its member functions. To provide this feature, we take advantage of TSan compile instrumentation at LLVM Intermediate Representation (IR) level to insert calls to our internal functions responsible for checking the semantics of the SPSC/MPMC queues.
To illustrate how TSan provides compile instrumentation, we leverage the example in Listing 3, in which the push and pop member functions of the FastFlow SPSC queue class have been instrumented ‡ .
As can be seen, TSan instruments all read and write accesses to ‡ Note that these codes are figurative for the example, since the TSan instrumentation is not observable in the user's source code but in the IR code. For the sake of readability and simplicity, we opted for porting TSan IR instrumentation directly to the user source code.
nonlocal variables plus the prologue and epilogue of the same function with __tsan_func_entry and __tsan_func_exit, respectively.
In our approach, we instrument as well the queue member function with the routine __tsan_register_event, which is responsible for registering the events occurred on a specific queue.
This routine receives as a sole parameter the queue member routine name to properly unwind the stack and retrieve the this pointer location. Afterwards, the routine adds the pair key-value

EXPERIMENTAL RESULTS
In this section, we perform an experimental evaluation of the semantics implemented for the SPSC and MPMC lock-free queues into TSan. In the following, we describe in detail the target platform, software, and benchmarks sets used for the evaluation. Afek's (LCRQ), 29 Fatourou and Kallimanis's (CC) 36

and Yang
Mellor-Crummey's (wait free), 30 Table 6 provides more details about these applications.
(Note that we configured TSan to use the hybrid runtime.) Afterwards, we executed them using a fixed pool of 24 worker threads, ie, to fully populate the cores of IVY. To analyze the amount of false positives filtered out, we gathered data races reports generated by TSan, while to evaluate detection of misuses, we collected the results of the reports generated.

Analysis of global data races
Our first experiment analyzes statistically data races occurred during the execution of both SPSC and MPMC benchmarks plus the application set with special focus on those related to the both queue types, respectively. Specially, we evaluate how the implementation of semantics aids to drop false positives, as the first contribution of this paper.
SPSC queues. We start by studying the impact of SPSC queue data races on the execution of the SPSC benchmarks and applications. Figure 2 shows (percentagewise) the portion of SPSC-bounded and uSPSC queue-related data races with respect to the others for FIGURE 2 Percentage of single-producer/single-consumer (SPSC)/unbounded SPSC (uSPSC) data races with respect to the total for both sets FIGURE 3 Breakdown of single-producer/single-consumer (SPSC)/ unbounded SPSC (uSPSC) data races between benign, undefined, and harmful for both sets both sets. Note that we consider part of the SPSC/uSPSC races those in which only one side was related to a function member of the SPSC/uSPSC queue classes. As observed for the -benchmark set, roughly 71 % of the data races, on average, were due to SPSC queues.
A similar percentage can be appreciated for the application set (69 %).
Generally, these percentages give a notion of the importance of this kind of data races occurring in SPSC queue-related functions. In this concrete case, we also observe that uSPSC queues have little impact on the data races detected. This is due to uSPSC queues are internally implemented using multiple SPSC queues, and most of the conflicting functions only belong to SPSC queues.
As stated in Section 5, a contribution of the paper is to filter, whenever possible, those benign SPSC/uSPSC queue-related data races according to the Equation 1 and 2 stated in Section 4.2. Taking advantage of our implementation, in Figure 3, we classify data races into 3 different groups: benign, undefined, and harmful. These type of races are defined as follows: Benign data races represent those complying both requirements.
Undefined data races stand for those in which TSan failed to restore the stack of one of the threads involved in the data race, and thus, the semantic requirements could not be checked.
Harmful data races stand for those in which, at least, one of the requirements was violated.
Analyzing the percentage breakdown of the different groups, we observe small percentages of data races (about 5 %) were classified as undefined. Since we are not aware of the specific cause that prevented TSan from restoring the stack, we are not confident to classify these data races as benign or harmful. A deep understanding of the TSan implementation is needed to understand the nature of such undefined races. This step will be considered part of the future work.
To gain insights into this issue, we performed an extra experiment considering the FastFlow implementation of the bounded and unbounded SPSC queues plus the Lamport version. § These tests, buf_spsc, buf_uspsc, and buf_Lamport, corroborate that percentages of the undefined data races are independent of the queue version. Considering that all implementations are semantically correct but data races are still detected by TSan, we assume that they are all false positives.
Similarly, Table 7 combines the breakdown for the different types of data races at SPSC and application levels for both benchmark sets.
Additionally, it incorporates figures representing the total number of data races, average of data races per test, and the corresponding percentages over the total data races detected on the application, regardless of their source. Note that the analysis of the SPSC § The codes of these structures can be found in the FastFlow SVN repository https:// sourceforge.net/projects/mc-fastflow, more specifically in file ff/buffer.hpp. Abbreviation: SPSC, single-producer/single-consumer, uSPSC, unbounded single-producer/single-consumer. Finally, the last 2 columns of Table 7 present figures without and with the data race filtering technique, respectively. As can be seen, we reduce about two-thirds of the number of warnings of data races for both sets tested. Being aware that in this case the filtering technique was performed with the SPSC and uSPSC queues, we are confident that more false positives would have been reduced if semantics for other parallel lock-free data structures had been taken into account.

MPMC queues.
We perform a similar study for the MPMC queues data races on the execution of the MPMC benchmarks. Figure 4 combines the percentage of MPMC queue races with respect to the total number of data races detected and the breakdown of MPMC data races for its specific benchmarks. As can be seen in the first plot, the average percentage of MPMC data races represents about 65 % of the total data races detected. To gain insights into the root cause of these false data races, we review individually the different MPMC queue implementations of the MPMC benchmark set.
• For the MS lock-free queue, we observe that most of false data races occurred mainly when a thread was enqueuing an item while other was executing an atomic CAS operation on a pop operation.
We also detect that some data races occurred when a thread was performing another CAS operation when other thread was handling a hazard pointer. Note that hazard pointers are an approach to solve dynamic memory management in lock-free data structures dealing with the ABA problem. Obviously, all these data races were considered as false positives since the Michael and Scott's implementation has been proven to be a correct lock-free structure.
• For the LCRQ lock-free queue, we detect a similar behavior regarding the occurrence of data races in the MS implementation. Basically, we note that a large part of these false positives were caused by a thread pushing an element to the queue while other thread was dealing with a CAS or FAA atomic operation. Although in these cases, TSan is, indeed, instrumenting atomic operations; it still reports data races. We believe that these aspects are not fully supported by TSan, as on the contrary these races would not have been reported.
Regardless of these issues and thanks to the implemented semantics we are able to drop them in a safe way.
• The CC queue presents a different behavior. Taking into account that this structure is a blocking queue that leverages a coarse-grain lock; false data races detect are mainly given by synchronization primitives between threads pushing and popping data to/from the queue.
In this case, TSan is also not able to discard them automatically.
Furthermore, we note an important portion of data races occurring between 2 threads calling respectively to a serial push operation and a POSIX memalign routine.
• The wait-free queue presents a similar nature with respect to the MS and LCRQ queue implementations. In this case, the implementation leverages CAS and FAA operations, thus incurring in false TSan data race reports within these atomic operations. Specifically, we find data races occurring on a spinning thread while other thread was executing an atomic operation. We believe this is also an issue of the TSan race detector, as it is not capable to fully handle this kind of atomic operations.
Focusing on right-hand side plot of Figure 4, we appreciate a small percentage of undefined data races. As mentioned in the study of SPSC queues data races, undefined data races occurred when TSan was not able to unwind the stack of the thread that wrote in the same memory address while the second was reading or writing on it. In the same way it was done for SPSC data races, Table 8 combines figures for the total, per test and percentage of data race warning messages for the aforementioned MPMC queue implementations. As can be seen, the average of benign data races detected is about 56%, while the undefined were roughly 8%. Other data races, not related to MPMC queues, occupied a third part of the total warnings generated. Overall, with the semantic verification we are able to detect benign data races and to drop about 56% of the total data race warnings reported to the end user.

Evaluation of misuses of lock-free queues
In this section, we evaluate how a set of high-level semantics can improve the detection of specific misuses in lock-free data queues, even when TSan does not report any data race. Specifically, we test this feature using the special semantics for the FastFlow queues introduced in Section 5.2, as the second main contribution of this paper. To this end, we use a fault injection mechanism to assess whether misuses of SPSC queues can be detected in the form of data races or not, while we use semantics to really determine if they are real misuses. Therefore, we use the synthetic tests uspsc, uspsc_r1, uspsc_r2 and uspsc_lock, being the last 3 faulty on purpose. Each of these tests uses an uSPSC queue in the following ways: uspsc is comprised of a producer and a consumer thread, enqueuing and dequeuing 10 000 elements, respectively. Thus, this test makes good use of the uSPSC queue.
uspsc_r1 is composed of a producer and 2 consumer threads. While the producer pushes 10 000 elements, each of the concur-  uspsc, we note that all data races detected, even for SPSC and uSPSC are false positives; however, roughly 20% of the SPSC queues internally used by the main uSPSC queue have violated some semantic requirement. Since these are unexpected results, we deeply analyzed application traces to find out the root cause of the misuses. Particularly, the way that a FastFlow uSPSC queue dynamically manages internal SPSC FIGURE 5 Breakdown of single-producer/single-consumer (SPSC)/unbounded SPSC (uSPSC) data races vs. misuses queues (storing them in a internal cache that is again a SPSC queue) semantics, the slowdown introduced by our approach ranges between 0.35% to 0.95%; therefore, the overhead is almost negligible and compensated by the benefits.

CONCLUSIONS
Data race detectors aid to a great extent developers to easily identify data races in parallel applications. Several postmortem and dynamical approaches for data race detection have been implemented among a range of tools and plug-ins for compilation infrastructures. However, none of them is aware of the semantics behind the data races detected.
The concurrent use of a shared resource within a correct lock-free parallel structure should not always imply a data race, unless its semantics are violated. In the same way, an application free of warning data reports does not entail that its internal data structures have been properly used. In this paper, we focused on the general of the SPSC and MPMC queues as for the lock-free parallel structures and leveraged, as a use cases, benchmarks and applications from FastFlow and several state-of-the-art MPMC queue implementations.
Being aware of the importance of these structures, we formalize the semantics of the SPSC and MPMC queues and build a set of requirements to determine whether a queue has been properly used or not.
Afterwards, we implement the formalization of these semantics into TSan, a well-known dynamic data race detector among the LLVM Clang compiler. With it, we provide 2 novel features: (1) filtering data race warnings classified as false positives, and (2) detecting misuses via semantics of such lock-free data structures. The ability of detecting benign data races at runtime is a very helpful feature to prevent overwhelming users due to excessive false race reports. Also, it allows detecting misuses in data structures through a second level of verification semantics, even when data races are not detected. Through these extensions we demonstrate that we are able to discard, on average 60% of data races classified as false positives. We also observe that some wrong uses of lock-free data structures cannot be detected with a race detector but via high-level semantics.
For future work, we aim at supporting other kinds of lock-free data structures, such as hash tables, sets, red-black trees, etc. In general, we advocate that other lock-free data races can be supported as long as a formalization of their semantics is feasible. Also, we plan to use semantics to detect other types of catastrophic failures, eg, deadlocks, livelocks and lock starvation, and provide support for other architectures, such as PowerPC and ARM.

ACKNOWLEDGMENTS
This work was supported by the EU Project644235 "RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications" under the programme ICT-09-2014.