Data Gathering Algorithms for Wireless Sensor Networks: a Survey

Recent developments in processor, memory and radio technology have enabled wireless sensor networks which are deployed to collect useful information from an area of interest. The sensed data must be gathered and transmitted to a base station where it is further processed for end-user queries. Since the network consists of low-cost nodes with limited battery power, power efficient methods must be employed for data gathering and aggregation in order to achieve long network lifetimes. In an environment where in a round of communication each of the sensor nodes has data to send to a base station, it is important to minimize the total energy consumed by the system in a round so that the system lifetime is maximized. With the use of data fusion and aggregation techniques, while minimizing the total energy per round, if power consumption per node can be balanced as well, a near optimal data gathering and routing scheme can be achieved in terms of network lifetime. Several application specific sensor network data gathering protocols have been proposed in research literatures. However, most of the proposed algorithms have been some attention to the related network lifetime and saving energy are two critical issues for wireless sensor networks. In this paper we have explored general network lifetime in wireless sensor networks and made an extensive study to categorize available data gathering techniques and analyze possible network lifetime on them.


INTRODUCTION
With the introduction of low-cost processor, memory, and radio technologies, it becomes possible to build inexpensive wireless micro-sensor nodes.Although these sensors are not so powerful compared to their expensive macro-sensor counterparts, by using hundreds or thousands of them it is possible to build a high quality, fault-tolerant sensor network.These networks can be used to collect useful information from an area of interest, especially where the physical environment is so harsh that the macro-sensor counterparts cannot be deployed.They have a wide range of applications [18], from military to civil, that may be realized by using different type of sensor devices with different capabilities for different kinds of environments [12].The main constraint of sensor nodes is their very low finite battery energy, which limits the lifetime and the quality of the network [16].For that reason, the protocols running on sensor networks must consume the resources of the nodes efficiently in order to achieve a longer network lifetime.There is an ongoing research on power management issues in order to reduce the power consumption[2] [21] when the nodes become idle.When power efficient communication is considered, it is important to maximize the nodes' lifetimes, reduce bandwidth requirements by using local collaboration among the nodes, and tolerate node failures, besides delivering the data efficiently.
Data gathering protocols are formulated for configuring the network and collecting information from the desired environment [20].In each round of the data gathering protocol, data from the nodes need to be collected and transmitted to (BS) [13], where from the end user can access the data.A simple way of doing that is aggregating (sum, average, min, max, count) the data originating from different nodes [18].A more elegant solution is data fusion which can be defined as combination of several unreliable data measurements to produce a more accurate signal by enhancing the common signal and reducing the uncorrelated noise.Sensor nodes use different data aggregation techniques to achieve energy efficiency.The aim is efficient transmission of all the data to the base station so that the lifetime of the network is maximized in terms of rounds, where a round is defined as the process of gathering all the data from sensor nodes to the base station, regardless of how much time it takes.Existing data gathering protocol [21] can be classified in to different categories based on the network structure and protocol operation based on routing protocols [13][17] that aim at power-saving and prolonging network lifetime are intensively studied in research community [14][19].

PRELIMINARIES
This section presents the assumptions and radio model of the network under consideration [10].

Assumptions
A Fixed network which includes in mobile sensor nodes [4] and base station is considered in our study with the following assumptions [9].
• The network is considered homogeneous and all of the sensor nodes have the same initial energy.• Each sensor node knows its own geographical position.
• All nodes measure the environmental parameters at a fixed rate and send it periodically to the receiver nodes [18].• The radio channel is symmetric such that energy consumption of data transmission from node A to node B is the same as that of transmission from node B to node A. Each sensor nodes can operate either in sensing mode to monitor the environment parameters and transmit to the base station or cluster head mode to gather data, compress it and forward to the BS[13].

scheme for Radio energy
Sensor node main components and associated energy consumption are shown in Fig. 1.Both the free space and multipath fading channel models [24] mentioned in to compute energy dissipated during the process of transmitting and receiving information [10] Where Eelec is the energy spent to operate the transceiver circuit, Efs and Emp are the energy expenditure of transmitting one bit data to achieve an acceptable bit error rate and is dependent on the distance of transmission in the case of free space model and multipath fading model [17].If the transmission distance is less than a threshold d0, the free space model is applied; otherwise, the multipath model is used.The threshold d0 is calculated as

REVIEW OF DATA GATHERING TECHNIQUES
The powerful underlying and enabling concept is data gathering in wireless sensor networksdelivery of data by an integrated and orchestrated suit of algorithms [8][11][22][29].

Energy-efficient Routing Algorithm to Prolong Lifetime
Energy-efficient Routing Algorithm to Prolong Lifetime (ERAPL) was proposed by Yi-hua Zhu,et al [27], in which able to dramatically prolong network lifetime while efficiently expends energy [3].In the ERAPL, a data gathering sequence (DGS), used to avoid mutual transmission and loop transmission among nodes, is constructed, and each node proportionally transmits traffic to the links confined in the DGS.In addition, a mathematical programming model, in which minimal remaining energy of nodes and total energy consumption are included, and is presented to optimize network lifetime.Moreover, genetic algorithms are used to find the optimal solution of the proposed programming problem.Results show the ERAPL outperforms them in terms of network lifetime.

Harmony Search Algorithm (HSA)
Harmony Search Algorithm was proposed by D.C. Hoang, and R. Kumar,[10], Clustering technique is one of the methods utilized to extend lifetime of the network by applying data aggregation and balancing energy consumption among sensor nodes of the network.Harmony Search Algorithm (HSA) for minimizing the intra-cluster distance and optimizing the energy consumption of the network.HSA is music based meta heuristic optimization method which is analogous with the music improvisation process where musician continue to polish the pitches in order to obtain better harmony.Results demonstrate that the proposed protocol using HSA can reduce energy consumption and improve the network lifetime.

Novel clustering algorithm OCABTR
Novel clustering algorithm OCABTR was Proposed, Ying Liang, Hongwei Gao [28], a novel clustering algorithm OCABTR which fully consider the characteristic of target occurrence in monitor area.In OCABTR, the authors adopted the strategy of forming cluster first and selecting cluster-head afterward.The cluster is formed by genetic algorithm to optimize and partition the adjacent nodes which will sense similar target into one cluster.Since improving the rate of data aggregation in clusters, this approach can effectively reduce redundant data transmission and the whole energy consumed in the network.The operation of OCABTR is divided into rounds.Each round begins with a set-up phase when the clusters are organized, followed by a steady-state phase when data are transferred from the nodes to the cluster head and on to the BS.The experimental results demonstrate that the proposed algorithms significantly outperform previous methods, in terms of system lifetime.

Multi-Layer Energy-Efficient And Delay-Reducing Chain-Based Data Gathering Protocol
Multi-Layer Energy-Efficient And Delay-Reducing Chain-Based Data Gathering Protocol proposed by [15], Lingyun Yuan, et al.The protocol puts forward the idea of multi-layer chain, and uses the minimum total energy algorithm to construct the chain.Moreover, the maximum residual energy of nodes is the standard for selection of leaders.The experimental results show that MEDC works for WSTMN better than LEACH and PEGASIS, which can not only prolong the network lifetime, but also reduce the network delay remarkably.

Steiner Points Grid Routing
Steiner Points Grid Routing was proposed by, Chiu-Kuo Liang,et al.[5]In order to reduce the total energy consumption for data transmission between the source node and the sink node, a different virtual grid structure instead of virtual grid in GGR is constructed.The idea is to construct the virtual grid structure based on the square Steiner trees [19].Once the sensor nodes are deployed in the sensor field, the sink node starts to construct the grid structure.The sink divides the plane into a grid of cells.Cross-points of the grid are the Dissemination Points (DPs).
The size of the cells, denoted as α, is determined by the sink such that DPs are not within direct transmission range.The sink is the first DP.Knowing its own position and the size of each cell, the sink is able to send a data request (in the form of a data-announcement message) to each adjacent Dissemination Point in the grid.Any node that is within the target region of a received QUERY message stores the appropriate routing information and starts to send the sensed data (in the form of DATA message) to the sink.The routing information contains the appropriate upstream DNs through which DATA messages will be forwarded.DN will find the appropriate path to transmit DATA message depending on which DP or SP it belongs to.If DN belongs to SP, then it will choose the upstream path along the hexagonal structure.

Data Gathering Algorithm Based on Energy Level( DGEL)
The DGEL [25] algorithm was proposed Zheng Wang and Yunsheng Liu.The algorithm is executed in a manner very similar to FLSPT algorithm.The proposed definitions to the algorithm are applied, and the pseudo code for DGEL is presented.First, two sets namely the sink node set S and the sensor nodes set V-{S}, both of which was given different initial values at the beginning is constructed.To further reduce computational cost, the max-priority queue which contains all nodes sorted in the descending order of ELPD [25] was introduced into the algorithm.Next, DGEL periodically selects a new node with the highest priority in Q outside the computed routing tree until the queue is empty.Steps illustrate how the auxiliary vectors are updated when the current node which is about to join in the routing tree interferes with its neighbors.To refresh a node's estimate vectors, a comparison is first done in terms of ELPD [25] In case of equality, the second-level estimates (referred to energy level and bandwidth consumption) is used to break the ties here á is an adjustment coefficient which allows energy level to fluctuate in an appropriate boundary.

Energy-efficient and Delay-aware Data Gathering Protocol
Energy-efficient and Delay-aware Data Gathering Protocol [14] was proposed by Zuzhi Fan and GuangZhou, which is efficient in the ways that it prolongs the lifetime of network, as well as takes less time to finish a transmission round.The simulation results show that it improves the average EnergyxDelay metric compared to other protocols.A distributed topology construction algorithm each node locally exchanges its state information, including energy level, location at the fixed power level to discovery neighbor nodes."Local exchange" here means that broadcasting at an output power level corresponding to a cluster radius R. According to discussion, the minimum value of R is given as equation where , Parea is the size of the network area.After the neighbor discover phase, each node gets its degree and neighbor nodes within range R, and begins clustering process, which contain a number of iterations.In each iteration, the 'uncovered' node with suitable degree selects itself as a candidate cluster-head.An 'uncovered' node is neither a cluster-head nor a member of cluster.If more than one candidate cluster-heads are located in the same cluster range, the node with large degree will be selected as cluster-head.Once a node is selected as a CH, it asks all neighbor nodes to join the cluster by broadcasting its cluster members list with maximum transmission power.On receiving the message, any nodes update its own neighbor node list by getting rid of those nodes which belong to the received list and enter into the next iterations.The clustering process will end if the expected number of cluster-heads is selected or no suitable nodes are available.At the end of process, each cluster-heads broadcast their status to the other sensors in the network.The rest 'uncovered' sensors determine to which cluster it wants to join by choosing the cluster-head that requires minimum cost.Taking the energy consumption and transmission delay into consideration, the cost is described as the product of the distance from the cluster-head and cluster-head degree.

Energy-Efficient Data Gathering Protocol (EEDGP)
Energy-Efficient Data Gathering Protocol was proposed by Jun Yang,et al.[22],EEDGP includes a clustering method of balancing energy consumption, a data prediction transmission strategy and an energy-aware multi hop routing algorithm.In clustering process phase, the initial probability of node for cluster head election is derived from mathematical relation between application's seamless coverage fraction and numbers of required cluster heads.In data aggregation phase, the spatial correlation of data within a cluster is utilized by cluster head to aggregate sampling data.According to temporal correlation of sampling data, cluster heads send data to sink node using prediction transmission strategy while satisfying the transmission precision in the data transmission phase, and the lifetime of network is greatly prolonged by this strategy.In order to mitigate the hot spot problem among cluster heads, a greedy geographic and energy-aware multi hop routing algorithm is presented for inter-cluster communication.Simulation results show that EEDGP outperforms in terms of network lifetime by balancing energy consumption and decrease of transmission while meeting desired application-specific requirements.

Energy-efficient data gathering algorithm(EDGA)
The EDGA was proposed by Jing Yang et al. [26], to minimize the energy consumption and maximize the network lifetime.In order to realize the goal, the basic operations of EDGA are divided into three distinct phases: Cluster Formation, Chain Construction and Data Transmission.In the first phase, the network will be grouped into clusters.In the second phase, the CHs use ACO to construct chain in their cluster.In the third phase, data is gathered and transmitted to the sink.It is assumed that each node can obtain the statuses of all its neighbors through broadcasting a topology discovery message, which includes the information of neighbors, such as the residual energy and ID.When a node receives the message, it saves the received information in Table 1.
En not qn ID is the identification index of neighbors.Di is the distance between node vi to the sink.Ei is the residual energy of node vi.qi indicates the probability of node i becoming the CH.After the CHs receive the data, they use multi-hop method to send data to the sink.In contrast to singlehop routing algorithms, multi-hop routing algorithms let a node wanting to transmit data to the sink find one or multiple intermediate nodes, which have focused on load balancing and reducing the communication cost.In WSNs, such a scheme can distribute energy usage among nodes as means to increase the network lifetime.

DISCUSSIONS
From the number of papers that we have reviewed for wireless sensor networks in data gathering techniques have different strong and weak points about network lifetime and energy consumption.All Data Gathering algorithms have to comply with a few basic requirements.The most important requirement is that a data gathering algorithm has to be imperceptible.They have a set of criteria to further define the imperceptibility of an algorithm.These requirements are as follows

Network density
The network density is the number of nodes per square meter.It varies from one deployment to another and from one node to another within the same deployment depending on the node distribution.

Energy
The energy is an important parameter [3][7] in a resource limited network such as the WSN.Indeed, the remaining battery level appears to be the most important value in this parameter but it is not the only one.Consequently, the agent will reject some cooperation requests because it will consider that there is not enough energy.

Position within the network
In various works defined so far, three types of node positions normal, edge and critical are used.The normal position is the position inside the network where the node has multiple neighbors..The edge node is a node at the border of the network and having a restricted view of the network limited to only one neighbor.A node is considered in a critical position if it connects two parts of the network.

Residual Energy
Residual energy is defined as the remaining power of a sensor node whenever topology changes, which can be an indicator of the stability of a link and the survival time of a node.

Energy Level
Energy lever consists of two different ingredients in our work, the first definition is described as the energy level of a node which represents the amount of packets that a node can transmit to its neighbors under the constraint of residual energy, while energy level of a path is defined as the minimum value of energy level among the nodes along the path from a sensor node to the sink.

Network Throughput
It is measured by the number of data packets that the sink node can gather before all data routing path failed due to residual energy shortage.This demonstrates the routing protocol in the ability of maximizing the network throughput coincided with energy restriction [7].

Network lifetime
It is the average number of dead nodes, as a result of path failure in terms of different data collection rounds.It presents the efficiency of rounds.It presents the efficiency of a protocol in extending the number of living nodes so as to prolong the network lifetime [6].

Network Topology
A sensor node should be able to make decisions to estimate, for example, the importance of the sensed information.It should be also able to cooperate with other sensor nodes in order to eliminate the inter-sensor-nodes redundancy and/or to concatenate data.

Latency
Latency requirement depends on the applications.In an environment surveillance application, when an event is detected, sensor nodes should be able to report the local processing result to sink in real time so that appropriate action can be taken promptly [7]

Sensor Node
A sensor node is the core component of a WSN.Sensor nodes can take on multiple roles in a network, such as simple sensing; data storage; routing; and data processing.

Base Station
The base station is at the upper level of the hierarchical WSN.It provides the communication link between the sensor network and the end-user.

Power
The power utilized in a sensor network is consumed as sensors are performing sensing, processing and communication tasks.Due to the limited energy nature of the sensor nodes, network lifetime is dependent on the efficient use of this energy [21].

RESEARCH DIRECTIONS
The general data gathering algorithms for wireless sensor networks discussed so far, as well as the specific implementations of various algorithms of WSN continues a number of research directions, and opens some new ones.

Query processing
To query sensor database, two extensions to traditional SQL query language have been proposed, including QoS and semantics.QoS extensions allow an application to express the concerned factors during the planning and execution of a query.Semantic extensions enable the representation of high-level information on group activities of the objects in a sensor.Database for wireless sensor networks have been taken into consideration.The proposals are general and thus enable a database view with query based interaction applicable to most sensor network applications and tasks.The proposed extensions are capable of achieving various task goals, but it remains unclear how difficult it is to implement these extensions in a sensor network query processing and execution system.This will be the main issue for the future investigation.

Routing Tree
Given an arbitrary communication graph and a routing tree, a wavelet lifting transform which uses very low number of raw data transmissions have been defined.As future work, other problems including selecting the tree, transmission schedule and transform jointly for a given graph can be considered.

Power aware routing
Power-aware routing protocol DGEL, which builds energy-effective path for data gathering through enhanced link sharing among sensor nodes, is used in small scale networks.The proposed algorithm, which posses' simplicity, high throughput and the prolonged network lifetime may be applied for large-scale communication networks would be another future research direction.

Optimality in algorithms
In terms of power levels of the sensors considered, the heterogeneous and adjustable instead of fixed and homogeneous as was considered in the previous work an upper bound on the lifetime of the optimal data gathering tree has been derived.An iterative algorithm that progressively reduces the maximum normalized load (hence increases of the lifetime) of a given initial tree has been developed.The optimality algorithm can be used to construct new trees periodically or at appropriate time instants to further prolong the lifetime of a sensor networks.

5.CONCLUSION
Wireless sensor networks are more than just a specific form of ad hoc networks.The stringent miniaturization and cost requirements make economic usage of energy and computational power a significantly bigger issue than in normal ad hoc networks.Moreover, specific applications require a rethinking of some of the basic paradigms with which communication protocols are engineered.As wireless sensor networks are still a young research field, much activity is still on-going to solve many open issues.As some of the underlying hardware problems, especially with respect to the energy supply and miniaturization, are not yet completely solved, wireless sensor networks are at the time of this writing not yet ready for practical deployment.Nevertheless, these problems could be resolved in the near future.In a Wireless sensor networks, it is significant to prolong network lifetime so that more data can be collected by the sink(s).It is well known that, efficient use of energy is critical for networks lifetime.This paper outlined different critical issues in wireless sensor network in general and made an extensive study of different associated with existing data gathering algorithms then we focus on two key issues .These issues have a network lifetime and saving energy on them.

Figure 1 .
Figure 1.Sensor node peripherals and Energy Consumption for Data Aggregation are used.The energy consumption for transmitting an l bit message over a distance d is

Table 2 .
Comparision of different Techniques