A Wave-Based Request-Response Protocol for Latency Minimization in WSNs

Transmission latency is a key performance metrics in most wireless sensor network (WSN) applications. Nodes in a WSN often keep their radio transceivers off, and turn them on periodically using a duty cycling mechanism. The latter is a major source of delay in the network, because transmissions must wait for the next receiver wake-up. In this paper, we present a cross-layer approach to minimize latency of a request-response (RR) protocol adopted in an IEEE 802.15.4-based WSN where the IPv6 routing protocol for low-power and lossy networks (RPLs) is used. Extra wake-ups are generated dynamically to match the predicted arrival time of the response packet, in order to reduce the duty cycling delay. The proposed approach is verified with the Cooja simulator, relying on the Contiki operating system (OS). The observed experimental results show a shorter RR delay with respect to a phase alignment (PA) approach.


I. INTRODUCTION
E NERGY efficiency and latency are key performance metrics in most wireless sensor network (WSN) applications. The radio transceiver is one of the components with the highest power consumption on a low-power wireless sensor node. Therefore, nodes of a WSN often keep their radio transceivers off as much as possible, to prolong their batteries' lifetimes. Since a node cannot receive any data when the transceiver is turned off, a duty cycling mechanism must be used at the medium access control (MAC) layer, to periodically turn the radio on. Hence, duty cycling is a major source of delay, because packets must wait at the sender node for the receiver's wake-up before they can be sent.
The IPv6 routing protocol for low-power and lossy networks (RPLs) [1] has been proposed to provide IP(v6) connectivity on low-power radios. RPL leads to a tree-like network topology, anchored at a sink node. In a typical WSN, nodes proactively send periodic sensor reading updates to the sink, complying with an update frequency that is limited by power constraints and network throughput. However, specific applications require sensor nodes to be queried aperiodically, using a request-response (RR) protocol, when sensor data are urgently needed in response to external unpredictable events (e.g., alarms, user interaction, etc.). Hence, the query should be answered as quickly as possible, to increase the responsiveness.
The main contribution of this paper is a cross-layer approach for delay optimization of an RR protocol. The approach requires the transmission of a single pair of IP packets: a request by the sink node and the response by the sensor node. The duty cycling delay is reduced by making the nodes wake up as if they were "hit" by (upward and downward) "waves," i.e., sequentially according to their depths. Namely, we configure the nodes wake-up phases so that: a wave of wake-ups carries downward the request packet from the sink to the required sensor node with a minimal delay; and a dynamic second wave is generated symmetrically for the response packet moving upward in the network toward the sink. The proposed approach is tested with the Contiki operating system (OS) in the Cooja simulator [2], showing a shorter RR delay with respect to a simpler phase alignment (PA) approach, as the one used in the RAWMAC low-latency harvesting protocol [3] and based on the use of static waves. Our analysis highlights interesting tradeoffs between RR delay and energy consumption.
The rest of this paper is organized as follows. In Section II, an overview of related work on duty cycling mechanisms is provided. Section III briefly introduces ContikiMAC [4], RPL, and RAWMAC protocols. The proposed approach is described in Section IV, while experimental results are shown in Section V. Finally, in Section VI we draw our conclusions.
II. RELATED WORK In the literature, it is possible to identify the following two main categories in radio duty cycling mechanisms: 1) synchronous mechanisms, requiring a complete synchronization between neighboring nodes (e.g., T-MAC [5]) and 2) asynchronous mechanisms, not depending on any a-priori synchronization and to be further subdivided into sender-initiated (e.g., B-MAC [6]) and receiver-initiated (e.g., BladeMAC [7]). As illustrative examples, with the WiseMAC protocol [8] nodes learn the wake-up phase of each other through a phase-lock optimization; in the X-MAC protocol [9], the sender wakes up the receiver using short preambles. The CXMAC protocol is a simplified implementation of X-MAC, where a mote periodically sends a short probe; when a potential receiver wakes up and gets such a probe with its own address, it replies with an ACK [10], [11]. In [12], a data aggregation approach for WSNs, based on the joint generation of a conflict-free schedule and an aggregation tree, is proposed. In [13], the latency minimization problem is supported in a cloud-oriented way with an aggregation scheme centered at the cluster head. Finally, a joint duty-cycle optimization control is proposed in [14], trying to define a global power management strategy suitable for WSNs, especially in industrial scenarios. The derivation of traffic latency minimization approaches in heterogeneous environments represents an interesting research activity which goes beyond the scope of this paper and will be the subject of future works.
The approach proposed in this paper relies on the RPL protocol, which is recalled in Section III-B. The opportunistic routing in wireless sensor networks (ORWs) [15] is a routing mechanism that forwards packets to the first awaken neighbor offering routing progress toward the destination node. In [16], an RPL routing metric for delay minimization is presented. The routing over low power and lossy networks (ROLL) Working Group [17] has specified OF0 [18], in which the only routing metric adopted refers to the hop count. The minimum rank with hysteresis objective function (MRHOF) [19] minimizes metrics that are additive along a route, and uses hysteresis to reduce churn in response to small metric changes. Other existing approaches rely on: predetermined scheduling mechanisms, such as WirelessHART [20] or time-slotted channel hopping (TSCH) [21]; periodic and asynchronous wake-up mechanisms, such as low power listening (LPL) [22] or low power probing (LPP) [23]; dynamic slot allocation schemes, such as GoMacH [24], ZMAC [25], and iQueue-MAC [26].

III. ContikiMAC, RPL, AND RAWMAC
The proposed approach is based on ContikiMAC, RPL, and RAWMAC [3]. A short overview of these protocols is presented in the following.

A. ContikiMAC
ContikiMAC is an asynchronous and sender-initiated radio duty cycling protocol. The nodes periodically wake up to check for possible incoming packet transmissions. The period between wake-ups, defined as cycle time, is denoted as C T . All nodes in a network have the same cycle time, but relative wake-up phases between pairs of nodes are essentially random, as they depend on the power-on instants of the nodes. A sender repeatedly transmits its packet until it receives a linklayer acknowledgment (ACK) from a receiver. This implies that the packet is repeated, in the worst case, for an entire cycle time, to ensure that the receiver awakes at least once. When the receiver detects a packet transmission during a wake-up, it keeps its radio transceiver on to receive the entire packet. Then, if the receiver is the recipient of the packet, an ACK packet is sent back to the sender. In order to reduce energy consumption and radio channel occupancy, ContikiMAC introduces a phase-lock mechanism. By recording, for each neighboring node, the last instant an ACK was received, a node can estimate the wake-up time of each neighbor, assuming a constant wake-up period C T . Then, a transmission is started a small guard time P g (dimension: ms) before the estimated receiver's wake-up.

B. RPL Routing Protocol
RPL is a distance-vector routing protocol based on the organization of the nodes in a tree-like network topology, referred to as destination-oriented directed acyclic graph (DODAG). The tree is anchored at a node, denoted as DAG root, and the cost of each path is evaluated according to metrics defined in an objective function (OF). Each node selects a parent node, which is the neighboring node with the shortest path to the root node. Thus, unicast communications between any node and the DAG root are optimized according to the metric. The current RPL implementation for the Contiki OS adopts, as default, the expected transmission count (ETX) metric [27], which tries to minimize the average number of packet transmissions required in order to deliver a packet to the ultimate destination. Nevertheless, other solutions can be adopted, leading, for example, to paths with the smallest number of poor quality links or of intermediate hops. RPL uses two types of control messages to maintain the topology: 1) a node broadcasts DODAG information objects (DIOs) to inform nearby nodes about its distance to the DAG root and 2) destination advertisement objects (DAOs) are unicast messages sent to the selected parent, used to populate the routing tables of ancestor nodes in the DODAG. Moreover, RPL uses a trickle mechanism to reduce the transmission of redundant DIO messages when the network is stable.

C. RAWMAC
RAWMAC is an adaptation layer in which RPL configures the ContikiMAC wake-up phase. In detail, in a WSN where nodes send data to the root node, RAWMAC reduces the delay between the instants of packet creation at the sensor node and reception at the root node. A node with RAWMAC exploits ContikiMAC phase discovery mechanism to align its phase so that it wakes up right before its preferred parent. When a node receives, at its own wake-up time, a packet to be routed upward, the parent wake up is about to occur. Therefore, a data propagation wave is created, from the leaves of the DODAG to the root. Packets traveling along this wave can reach the root node with minimal latency.
IV. PROPOSED APPROACH We consider a generic RR protocol in a network organized as an RPL DODAG, in which a request IP(v6) packet is sent by the DAG root node downward, toward a target node in the network. After a known processing time, the target node sends back a response upward to the DAG root. Our main goal is to minimize the delay between the generation of the request and the reception of the response at the DAG root.
To minimize the downward delay from the root node to the target, the nodes align their radio wake-up phases so that they wake up in downward sequence, i.e., each node wakes up right after its parent in the RPL DODAG. Therefore, nodes which relay packets from parent to child have to wait only a short time for the wake-up of the next-hop node. The downward PA approach is detailed in Section IV-A.
To minimize the upward delay of the response from the target node to the DAG root, intermediate nodes traversed by the request schedule an extra wake-up when the response is going to traverse them toward the DAG root. Therefore, nodes which relay the response have to wait only a short time for the wake-up of the parent node. The upward response transmission optimization is described in Section IV-B.
In Section IV-C, the proposed protocol is integrated with the RAWMAC low-latency harvesting protocol [3]. Both approaches in Sections IV-A and IV-B introduce new types of wake-ups, which may interfere with the ContikiMAC phase-lock mechanism: this issue is addressed in Section IV-D.

A. Wake-Up Phase Alignment
Nodes can, in principle, generate packets at any time, but a node can receive packets to be routed only at a wake-up. To reduce the downward delay for packet relay, each node shifts its wake-up phase so that it is aligned with that of its parent. A positive time offset P o (dimension: ms) is added to the phase, in order to account for the time necessary to receive the packet and send it to the next node. The offset P o should be chosen carefully: if it is too short, the packet may "miss" the wake-up of the next node and, then, the packet needs to wait a whole cycle time for the next wake-up; if it is too long, the packet waits uselessly and the transmission delay increases.
When a node C receives an ACK from its parent B at time t C,B , it changes its own wake-up phase φ C to the following φ C : where the modulus operator (a mod b) To prevent frequent updates due to small inaccuracies in the measurement of t C,B , the phase is updated only if changed by more than a (properly chosen, as will be shown in Section V-A) After the PA has completed, the network behavior for the RR protocol is shown in Fig. 1, in which, on the left, the request packet transmission from the root node to a target node is carried out. Nodes seem to wake up in succession when hit by waves, which carry packets which quickly reach the destination. The packet may be created at any time with a uniform distribution, so it waits at the root (on average) a time interval equal to C T /2 for the next wave. The root starts transmitting a little earlier (by the guard time P g ) than the predicted wake-up time. Then, every intermediate hop adds P o to the delay. Finally, the reception time at the last node is denoted as P . The total downward average delay can thus be (theoretically) written as where h is the depth of the target node, i.e., the number of hops from the DAG root. The response packet is transmitted back to the root node as illustrated in the right part of Fig. 1. A request is indeed able to quickly reach (propagating downward) the target node. However, in the upward direction, the response packet has to wait C T − P o , at each intermediate node, for the next wakeup of the parent node. Therefore, the theoretical average RR delay at the DAG route with downward PA is For sufficiently large values of h, D r is almost independent of P o . In particular, a similar result would be obtained in . After a fourth collision, the packet is dropped [28]. Therefore, the average collision delay D c,hop over a single hop is Hence, indicating the packet collision probability as p c , the probabilities of exactly zero, one, two, and three collisions can be evaluated as follows: where (4) and (5), the average supplementary collision delay for a single hop can be computed as follows: Therefore, since the delay D c,hop affects each hop upward and downward, the total D coll (h) for a target node at depth h can be expressed as Finally, since typically the packet collision probability p c 1, it can be concluded that Timeline of a network with PA and response optimization. A request Q is sent from the root node and a response R is received. Standard wake-ups are represented by red vertical lines. Green lines represent extra wake-ups. and p c,3 p c,1 . This allows to approximate D coll (h) as follows:

B. Response Delay Optimization
The upward delay experienced by the response packet is minimized by creating a second wave of extra wake-ups, denoted as response wave (RW), from target node to root node. This second wave is created on-demand, upon detection of a request packet. Packets which belong to an RR protocol are marked in the differentiated services (DSs) field (6 bits) of the IPv6 header. We define the "Request" DS code point as 000001 (request packets) and the "Response" DS code point as 000011 (response packets). These two code points should not interfere with any other protocol, because DS values with the least significant bit set to 1 are unassigned (i.e., free to use) according to RFC 2474 [29].
Whenever a node routes a request message, it schedules an RW wake-up at the predicted response arrival time. The desired behavior is shown in Fig. 2. Each hop adds a delay time P o for the request and a further delay P o for the response. An additional short fixed time P e is due to response processing at the target node. Therefore, given the forwarding of a request packet in correspondence to a wake-up at time t Q , the response packet is predicted to arrive at the following instant t R : where r is the number of hops (in the DODAG) between the current node and the target node. The RPL implementation was modified so that nodes include the hop count r for each descendant node in the DAO messages. A special case occurs for the DAG root: if it generates a packet too close to the next wake-up, it must wait a whole cycle time C T to send it. A response packet at the DAG root node relative to a request generated at time t G is predicted to arrive at the following instant: where t W is the time of the next wake-up of the first-level child node and r corresponds, in this case, to the depth of the target node. If a node does not receive a response packet at the scheduled RW wake-up, it assumes that packet transmission failed and that the packet is going to be retransmitted later. The RW wake-up is repeated W R times, every C T , attempting to create another wave for the retransmission; after W R unsuccessful attempts, the response packet is declared lost.
We remark that when a request packet is routed downward: 1) the next hop and the hop count r to the target node are found in the RPL database and 2) the radio duty cycle (RDC) layer predicts the response time using (9) and schedules an RW wake-up. Whenever a response packet is routed upward: 1) the RDC layer is informed so that no more RW wake-ups are generated and 2) phase-lock is ignored, so that the packet is sent immediately by the MAC layer to take advantage of the RW wake-up.
The total upward average delay is thus (theoretically) equal to The average RR delay D rr is computed by adding the downward delay D d (2), the upward delay D u (11), and the processing time P e , obtaining The delay D rr is expected to be lower than D r (3), because, in general, 2P o < C T .

C. Integration With RAWMAC
Unlike the PA procedure proposed in Section IV-A, the RAWMAC protocol aligns the node wake-ups to minimize the upward delay to the root node [3]. In the following, we integrate RAWMAC and PA by performing two wake-ups at each duty cycle. The PA wake-up is scheduled P o after the parent PA wake-up, while the RAWMAC wake-up is scheduled P o before the parent RAWMAC wake-up. The root node wakes up only once for each duty cycle, covering both RAWMAC and PA wake-ups (as shown in Fig. 3).
Given the PA wake-up phase φ PA B of a node B, the RAWMAC wake-up phase φ RAW where h B is the depth of node B in the RPL tree. When a node B receives from parent A an ACK due to the PA wake-up at time t PA B,A , it sets its own wake-up phases as follows: Conversely, when node B receives from parent A an ACK due to the RAWMAC wake-up at time t RAW B,A , it sets its own Fig. 4. Two requests are sent by the DAG root to nodes at depths 2 (left) and 3 (right), respectively. The first response follows the next upward wave and reaches the root at its next wake-up. The second response misses it and follows the subsequent one.
wake-up phases as follows: The method to differentiate between RAWMAC ACKs and PA ACKs is presented in Section IV-D.
According to the phase-lock mechanism of ContikiMAC (Section III-A), a node B sends a packet at the next wake-up of a neighbor node C. When two wake-ups are performed at each duty cycle, two phases are estimated by node B: φ RAW C and φ PA C . Upon receiving a PA ACK from node C, node B computes φ RAW C using (13). The specular equation computes φ PA C given φ RAW C . We remark that (13) depends on h C : since in an RPL DODAG a node communicates only to its parent and children, h C can be inferred as Fig. 4 illustrates the behavior of an RR protocol when both RAWMAC and PA are used. Requests follow a downward wave generated by PA and responses follow an upward wave generated by RAWMAC. For small values of h, the average delay is constant and equals to D m (1) = C T /2 + P g + P + C T − P o . The delay does not depend on the hop count as long as the response can catch the next upward wave. Each hop requires a time offset P o for the request and a time offset P o for the response, so that the accumulated multihop delay is A packet misses an upward wave whenever the accumulated hop delay D a (h) exceeds a multiple of a cycle time C T . Taking into account (17), whenever the depth h is such that with n ∈ {1, 2, 3, . . .}, then the delay D m increases by nC T . Hence, the theoretical RR delay D m increases in discrete steps as follows: In general, the delay D m (h) is not shorter than the RW delay D rr (h) (12). However, the RW approach is based on the assumption that requests are sent by the DAG root and responses are sent by other nodes in the network, as the upward wave is created only on-demand. Conversely, as both upward and downward waves are always active in RAWMAC, requests from nodes to the DAG root and requests from DAG root to the nodes are optimized simultaneously.

D. Link-Layer Acknowledgment Modifications
The approaches proposed in Sections IV-B and IV-C add extra wake-ups to the standard duty cycle: this interferes with the ContikiMAC phase-lock mechanism. In addition to wasted energy and delayed/failed transmissions, incorrect parent phase estimation may cause a phase change in the whole RPL DODAG subtree due to wake-up PA (Section IV-A). Therefore, we modify the link-layer ACK packet so that the type of generating wake-up can be inferred from it. Three kinds of ACKs are possible: 1) a standard RDC ACK, due to a PA wake-up; 2) an RW ACK, caused by the RR extra wake-ups; and 3) an RAWMAC ACK, caused by RAWMAC wake-ups. Phase update should not be performed in 2), because RW wake-ups do not depend on the node phase. In cases 1) and 3), the node receiving the ACK estimates the sender phase as described in Section IV-C. According

V. EXPERIMENTAL EVALUATION A. Experimental Setup
The proposed approach is implemented in Contiki OS, v. 2.6, carrying out the tests via Cooja, a Java-based simulator for Contiki-based wireless sensor nodes [2]. The radio transmission range is set to 50 m, with 100% transmission success rate. Nodes at distances between 50 and 100 m from the transmission source are set in the "interference" range, meaning that the received data cannot be decoded.
The test network, shown in Fig. 5, comprises 11 nodes and self-organizes into an RPL DODAG using the standard ETX RPL metric. Node 11 has been configured as the DAG root. The remaining ten nodes act as servers and are queried by the root. A UDP "echo" RR protocol has been implemented at the server nodes, with UDP packets sent by the root to each server node. The UDP payload consists of 15 bytes and contains a unique request identifier. Upon reception of a UDP packet destined to itself, a server node sends a response packet containing the same payload to the root node. If the request packet contains the Request code point in the IPv6 DS field, the Response code point is set in the response packet. The delay is computed as the time difference between the instant of request generation and the reception of the corresponding response at the transport layer of the root node.
For a more accurate timing, most nodes generate ACK packets in hardware, before handling the received packet to the operating system (Contiki OS). Hence, by default, the node firmware cannot change flags in the ACK packet, which is required by our approach (as highlighted in Section IV-D). Therefore, we modified the Cooja hardware simulator, so that Frame Control Flags could be set or unset by manipulating an appropriate control register.
The management of extra wake-ups does not require a significant algorithmic complexity. Each node maintains an  ordered list of the extra wake-ups which are currently active. In our implementation, an entry in the list contains information on the phase offset of the wake-up (2 bytes), the number of cycle times before the first wake-up is performed (1 byte), the number of wake-ups yet to be performed down from W R (1 byte), and the IPv6 address of the target node (16 bytes). Hence, for each RR simultaneously active in the network, 20 bytes are needed in the memory of each traversed node.
Six experiments have been carried out to evaluate the behavior of the proposed protocols. The fixed parameters are shown in Table I. In detail, the value chosen for P g corresponds to the default guard time of ContikiMAC for Sky nodes [31], computed as 10 · CHECK_TIME + CHECK_TIME_TX, i.e., 10 · 2(t c + t r ) + 6(t c + t r ), where t c is the interval between consecutive clear channel assessments (CCAs), and t r is the time for each CCA, as defined in [4], with t c = 1/2000 s and t r = 1/8192 s. Packet reception time P l corresponds to the estimated time required at the physical layer to receive and acknowledge a packet, which is limited to a 127-byte maximum payload [30]. The response computation  Tables I and II. time P e depends on the application layer and has been measured empirically for our echo protocol. The phase offset P o and the phase update threshold P o have been set according to [3], where the best P o has been estimated as around 35 ms ([3, Fig. 6(a)]) and the best P o in between 7 ÷ 9 ms ([3, Fig. 7]).
In each experiment, the root first waits 1 min for the formation of the RPL DODAG. Then, the root node sends 250 requests, subsequently, to each of the ten server nodes, for a total of 2500 requests. A request is sent every interval T RR = 4 s plus an additional random time uniformly distributed between 0 s and 1 s. If the response does not reach the root node within 5 s after the generation of the request, the response is considered lost and is not included in the results. Overall, about 1% of the responses were lost. The nodes' configurations, for each experiment, are summarized in Table II. As can be seen from the results, PA is always effective in minimizing downward delay. The cycle time was set to: 0.125 ms (8 Hz) in experiments A and B; and 0.250 ms (4 Hz) in experiments C, D, E, and F. Cycle times within this range have been used in the original design of ContikiMAC [4] and in previous works [3], [10], [16], [28], [31]. In a real application, the cycle time C T depends on several factors (e.g., on the available energy to be consumed) and should be chosen depending on the specific application requirements: a long cycle time implies less wake-ups, hence it usually reduces resources usage but increases delay. In experiments C and D, PA for downward delay minimization has been integrated with RAWMAC wake-ups for upward delay minimization (Section IV-C). RW (Section IV-B) is used in experiments A and E. In experiment C, both RAWMAC and RW have been used simultaneously: an extra upward wave is created by RW, when the response arrival is predicted, in addition to the upward waves already generated by RAWMAC at fixed intervals.

B. Delay Evaluation
In Fig. 6, the average RR delay for each experiment is shown, together with the associated range between Fig. 7. Average delay by node depth in the DODAG, for each experiment (continuous lines) and predicted by the theoretical models including the supplementary collision delay given by (7) (dashed lines). minimum and maximum delays. As expected, the RW upward optimization protocol lowers the average delay, with respect to the same configuration without the use of RW. The delay reduction is more evident for experiment E (about 53% lower with respect to experiment F) than for A (about 24% lower with respect to B). The delay also decreases by about 43% when introducing RAWMAC wake-ups (D) in a network where PA is used (F), but the delay is higher than the one obtained with the RW protocol (E). In fact, with both PA and RAWMAC, packets travel upward and downward as quickly as with RW, but they wait for the next upward wave at the server node. For requests from the DAG root to target nodes, the use of RAWMAC and RW together (C) provides little improvement with respect to the use of RW alone (E). As expected, a shorter cycle time C T decreases the overall delay both with RW (A with respect to E) and without RW (B with respect to F). More precisely, a shorter cycle time reduces the term C T /2 in (12) (Section IV-B) and provides faster packet collision recovery. In Fig. 6, we also show, for comparison purposes, the theoretically produced delays as diamonds. The delays are generally consistent with the theoretical performance predicted by (3) (for experiments B and F), (12) (for experiments A, C, and E), and (19) (for experiment D) in Section IV. The theoretical delay expressions are corrected to take into account collisions, adding the supplementary delay in (7). The single collision probability p c is estimated, based on our simulation results, as 0.027-this corresponds to the fraction of collided packet transmissions during the experiments. Nevertheless, theoretical predictions do not fit perfectly (although very accurate), as they correspond to an approximation, despite supplementary delays introduced in (7). Moreover, a few effects were not considered in the analysis, such as temporary network congestion and different collision probabilities in different regions of the network, depending on the local node spatial density. Fig. 7 illustrates the average RR delays, computed separately for each group of nodes at the same depth in the RPL DODAG, and predicted by the theoretical equations recalled in the previous paragraph (i.e., (3) for experiments B and F; (12) for experiments A, C, and E; and (19) for experiment D) and corrected by the additive collision delay given by (7). In general, the delay is approximately a linearly increasing function of the node depth h. It is important to note that theoretical predictions have the same trends of the experimental results, thus approximating them with a relative error between 1% and 14%. In experiment D, as predicted, the delay increases slowly, mostly because of the higher collision probability, until depth 4, where the packet misses the next upward wave and the delay increases by C T . At depth 3, the delay of experiment D is comparable with that in experiment C, where RW was also active. At that depth, the time to travel from the DAG root to that node and back (11) is almost a cycle time C T . The response packet waits a very short time at the node, because an upward wave generated by RAWMAC occurs right after the request reception, and the on-demand wave by RW does not provide any additional improvement.
In order to evaluate the behavior of the proposed RW protocol under heavier traffic loads, experiments E and F were repeated considering shorter values of the request interval T RR . As shown in Fig. 8, the RR delay is essentially unchanged for T RR equal to 4 s and 2 s, as most responses can reach the root before the next request is sent. For lower T RR , however, requests start interfering with each other and the average RR delay rises up to about twice as much for T RR = 0.5 s. Nonetheless, the proposed RW protocol (experiment E) is effective at reducing RR delay with respect to standard PA (experiment F) even in these conditions.

C. Delay and Power Consumption
We approximate the power consumption of a simulated node as proportional to the percentage of time which the radio is active for either listening, receiving or transmitting, as recorded by the Cooja PowerTracker. Indeed, in most real nodes, the radio transceiver is the most power-consuming component. A significant part of the radio activity in an experiment occurs during the formation of the RPL DODAG, when the nodes exchange a large amount of DIO and DAO messages. However, after the first 3 min, the DODAG is stable and the RPL trickle mechanism significantly reduces the number of DIO messages. To prevent bias, we computed the radio activity time separately for the first 3 min and for the remaining approximate 3.5 h of simulation.  9. Average RR delay for the network (x-axis) and radio activity time (y-axis). Horizontal error bars show minimum and maximum average delay (see Fig. 6). Vertical error bars show minimum and maximum radio activity percentage over the nodes in the network.
The observed results are shown in Table III. During the first 3 min, the radio activity is around 2% for experiments C, D, E, and F, while it is about 25% lower for experiments A and B. The results can be attributed to the large amount of broadcast DIO messages during network initialization, each of which is repeated for an entire cycle time. Cycle time is indeed shorter in experiments A and B than in C, D, E, and F. For the remaining part of the simulation, however, lower radio activity occurs for longer cycle time, i.e., in experiments E and F, because a smaller number of wake-ups occur. The same cycle time has been used in experiments C and D, but in these cases two wake-ups occur for each cycle time. Therefore, radio activity is comparable to that of a network with half cycle time, i.e., to configurations A and B. Finally, we observed a slight increase in radio activity due to the RR delay optimization RW, up to 9% in C compared to D.
The tradeoff between delay and power consumption is investigated in Fig. 9. Given a network with large cycle time (experiment F), the RR delay may be greatly reduced (experiment E) by using the proposed RW protocol, with a small relative increase in power consumption. A similar but slightly worse delay performance is instead obtained by integrating the RAWMAC protocol (experiment D). Integration with RAWMAC provides fast data harvesting from the nodes, as per RAWMAC properties. However, radio activity in experiment D increases by 40% and is similar to B, which has half the cycle time. A RR delay similar to E can be obtained by adding RW to D, resulting in a small increase in radio activity (experiment C). In Fig. 10, we show the number of packets transmitted at the physical layer during the experiments. The small number of packets for experiments A and B during the first 3 min confirms that, if the cycle time is shorter, broadcast packets are repeated for a shorter time. Moreover, a smaller number of packets are sent in experiments C and D than in E and F during the first 3 min. As two wake-ups are performed for each cycle time in C and D, the ContikiMAC phase-lock mechanism can estimate the phase of the neighboring nodes with less probes. Even so, the numbers of transmitted packets beyond 3 min are comparable.
VI. CONCLUSION In this paper, we have presented a cross-layer approach for the delay optimization of an RR protocol in RPL-based WSNs. The protocol generates upward and downward wake-up waves to minimize packet propagation delay. The performance of the proposed protocol has been evaluated using the Cooja simulator on the Contiki OS. The proposed approach has been compared with a PA approach and has been found to significantly reduce the RR delay, by about 53% with a 250-ms cycle time and by about 24% with a 125-ms cycle time. Moreover, it has been shown that the protocol can co-exist with the RAWMAC low-latency harvesting protocol. During the experiments, a tradeoff between network latency and power consumption has been highlighted. In particular, our results show that the proposed approach reduces the RR delay at the only cost of a slight increase in power consumption (about 9%). However, the use of RAWMAC alongside the proposed protocol almost doubles power consumption. As a future work, we plan to integrate the proposed approach with a multicast protocol, which allows to simultaneously query multiple nodes.

ACKNOWLEDGMENT
The work reflects only the authors' views; the European Commission is not liable for any use that may be made of the information contained herein.