An Efficient Recovery Mechanism with Checkpointing Approach for Cluster Federation

Checkpoint and recovery protocols are commonly used in distributed applications for providing fault tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved. Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms for checkpointing on distributed systems have been under study for years. It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of computation.Synchronus checkpointing approach assumes that a single process other than the application process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint. 1.INTRODUCTION Mobility management is one of the major functions of a GSM or a UMTS network that allows mobile phones to work. The aim of mobility management is to track where the subscribers are, allowing calls, SMS and other mobile phone services to be delivered to them. In a cellular telephone network, handoff is the transition for any given user of signal transmission from one base station to a geographically adjacent base station as the user moves around. In an ideal cellular telephone network, each end user's telephone set or modem (the subscriber's hardware) is always within range of a base station. The region covered by each base station is known as its cell. The size and shape of each cell in a network depends on the nature of the terrain in the region, the number of base stations, and the transmit/receive range of each base station. In theory, the cells in a network overlap; for much of the time, a subscriber's hardware is within range of more than one base station. The network must decide, from moment to moment, which base station will handle the signals to and from each and every subscriber's hardware. Vehicular ad hoc networks are gaining importance for inter-vehicle communication, because they allow for the local communication between vehicles without any infrastructure, configuration effort, and without the high costs of cellular networks. Besides local data exchange, vehicular applications may be extended by accessing Internet services. The access …

mobility support of the vehicular ad hoc network. In this paper we propose MMIP6, a communication protocol that integrates multihop IPv6-based vehicular ad hoc networks into the Internet. Whereas existing approaches are focused on small-scale ad hoc networking scenarios, MMIP6 is highly optimized for scalability and efficiency. The evaluation showed that MMIP6 is a suitable solution providing a scalable mobility support with an acceptable performance characteristic. Typical ITS applications can be categorized into safety, transport efficiency, and information/entertainment applications (i.e., infotainment) [1]. Vehicular ad hoc networks (VANETs) are emerging ITS technologies integrating wireless communications to vehicles. Different Consortia (e.g., Car-to-Car Communications Consortium (C2C-CC) [2]) and standardization organization (e.g., IETF) have been working on various issues in VANETs. C2C-CC aims to develop an open industrial standard for inter-vehicle communication using wireless LAN (WLAN) technology. For example, IEEE 802.11p or dedicated short range communications (DSRC) is an extension of 802.11 standards for inter-vehicle communication by IEEE working group. IETF has standardized Network Mobility Basic Support (NEMO BS) [3] for network mobility in VANETs. Originating from cellular networks, mobility management has been an important and challenging issue to support seamless communication. Mobility management includes location management and handoff management [4]. Location management has the functions of tracking and updating current location of mobile node (MN). Handoff management aims to maintain the active connections when MN changes its point of attachment. VANET is a special type of mobile ad hoc networks (MANETs) [5] with unique characteristics. Due to the high mobility of vehicles, topologies of VANETs are highly dynamic.

PHASES OF CHECKPOINTING
Checkpointing has two phases: • Saving a checkpoint • Checkpoint recovery following the failure.
To save a checkpoint, the memory and system, necessary to recover from a failure is sent to storage. Checkpoint recovery involves restoring the system state and memory from the checkpoint and restarting the computation from the checkpoint stored [6]. The aim of this thesis is to present an efficient, better bandwidth utilization, maximum response time, decentralized and cost effective checkpointing algorithm suitable for cluster federation. Throughout this survey, we use Np to denote the total number of processes and Nc is the clusters in the system where Np is much larger than Nc. Each process is assigned a unique id-number I (1<=i<=Np).

DATA STRUCTURE
In our check pointing scheme, for each process in the cluster, the checkpointing dependency information is maintained by its cluster head process. Each Cluster Head sends the control messages to the cluster head of other clusters which further multicasts the message to all currently active processes in the cluster.
This scheme reduces the message passing and number of lost messages is also reduced drastically, thus making system more available, reliable and faster. When a checkpointing procedure begins, the sending and the receiving of control messages are mainly accomplished amongst cluster head processes.
To maintain such additional information for processes, each CH maintains a 2-tuple table

RELATED WORK
S Kalaiselvi et.al [8] studied the algorithms for checkpointing parallel/distributed systems. It has been observed that most of the algorithms published for checkpointing in message passing systems are based on the seminal article by Chandy and Lamport. Number of reports have been published in this area by relaxing the assumptions made in this paper and by extending it to minimize the overheads of coordination and context saving.
Jiannong Cao et.al [9] proposed to address the need of applying different checkpointing schemes to different subsystems inside a single target system. The proposed algorithm has several advantages.
Ch. D. V. Subba Rao et.al [10] had proposed a new checkpointing protocol combined with selective sender based message logging .The protocol is free from the problem of lost messages Partha Sarathi et.al [11] several schemes for checkpointing and rollback recovery have been reported in the literature. We analyze some of these schemes under a stochastic model. We have derived expressions for average cost of checkpointing, rollback recovery, message logging and piggybacking with application messages in synchronous as well as asynchronous checkpointing. For quasi-synchronous checkpointing we show that in a system with n processes, the upper bound and lower bound of selective message logging are O(n 2 ) and O(n), respectively.
Y. Manable et.al [12] proposed a distributed coordinated checkpointing algorithm .A consistent global checkpoint is a set of states in which no message is recorded as received in one process and as not yet sent in another process. This algorithm obtains a consistent global checkpoint for any checkpoint initiation by any process.
S. Monnet et.al [13] suggested that a cluster takes two types of checkpoints, processes inside a cluster take checkpoint synchronously and a cluster takes a communication induced checkpoint whenever it receives an inter cluster application message.
J. Cao et.al [14] analyzed the need of integrating independent and coordinated checkpointing schemes for applications running in a hybrid distributed environment containing multiple heterogenous subsystems.
B. Gupta et.al [15] presented a simple non-blocking roll forward checkpointing/recovery mechanism for cluster federation. The effect of domino phenomenon is limited by the time interval between successive invocations of the algorithm and recovery is as simple as that in the synchronous approach.
Suriender Kumar et.al [16] focused on the hierarchical non blocking coordinated checkpointing algorithms suitable for distributed computing and eliminating the overhead of taking temporary checkpoints.
Guo hui et.al [17] in distributed computing systems, processes in different hosts take checkpoints to survive failures. For mobile computing systems, due to certain new characteristics such as mobility, low bandwidth, disconnection, low power consumption and limited memory, conventional distributed checkpointing schemes need to be reconsidered. In this paper, a novel min-process coordinated checkpointing algorithm that Qiangfeng Yiang et.al [18] checkpointing and rollback recovery are widely used techniques for achieving fault-tolerance in distributed systems. In this paper, we present a novel checkpointing algorithm which has the following desirable features: A process can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes come to know about a consistent global checkpoint initiation through information piggy-backed with the application messages or limited control messages if necessary.
Bidyut Gupta et.al [19] had presented a non-blocking coordinated checkpointing algorithm suitable for mobile environments. The advantages make the proposed algorithm suitable for mobile distributed computing systems are following advantages: (a) the proposed algorithm does not take any temporary checkpoint and hence the overhead of converting temporary checkpoint to permanent checkpoint is eliminated. (b) the proposed algorithm does not use mutable checkpoints. Hence the overhead of converting them to permanent ones is eliminated. (c) their algorithm does not allow any process to take useless checkpoints. It uses very few control messages and participating processes are interrupted less number of times.
Lalit Kumar et.al [20][7] presented a non-blocking minimum process coordinated checkpointing protocol that not only minimizes useless checkpoints but also minimizes overall bandwidth required over wireless channels. In their proposed protocol the height of checkpointing tree proposed to reduce. This will reduce the uncertainty period and number of induced checkpoint. J. L. Kim et.al [21] had presented a new efficient synchronized checkpointing protocol which exploits the dependency relation between processes in distributed systems. In their protocol, a process takes a checkpoint when it knows that all processes on which it computationally depends took their checkpoints, and hence the process need not always wait for the decision made by the checkpointing coordinator as in the conventional synchronized protocols.

5.WORKING MODEL
In proposed algorithm, when communication occurs between two processes in different clusters, then dependencies are generated between checkpoints taken in different clusters. Dependencies must be tracked in order to allow the application to be restarted from a consistent state. In our work based on idea adopted from, it is the sending process that ensures that none of its sent messages can remain an orphan (received-not-sent).
When the CH of any cluster initiates the checkpointing procedure by sending the control message to other clusters, then the current cluster's sequence number SN is piggybacked on each intercluster control message along with the first application message sent to any process in any cluster during th X global checkpoint interval. CH of each other cluster is responsible for storing these SN values for synchronization among clusters.  . The first application message sent by a CH to any other cluster only contains piggybacked information. However, any other process in source cluster doesn't need to piggyback SN value if it sends any other message to the same cluster before the next invocation of the proposed algorithm. Step 2:

CHECKPOINTING ALGORITHM
Step End of algorithm.

RECOVERY ALGORITHM
For each Process P k and 1<i<n, i!=k if S x ik > R x ki P * records these sequence numbers (R x ki + 1) to S x ik in lost-form-P i k ; //message with sequence numbers (R x ki + 1) to S x ik are the lost messages from P i to P k P* forms the total order of all lost messages sent by every P i , i!=k to P k using lost-form-P i k and the message log MESG k for P k

SYSTEM MODEL
In the existing scheme, when a sender sends a message it is received by all the processes whether they are participating in current checkpoint interval or not, resulting in bandwidth wastage, increased communication cost and traffic congestion. In proposed checkpointing algorithm, message moves in composite form and it's the cluster head who is responsible for sending message to other cluster heads and further each cluster head multicasts the message to all active processes. It results in efficient bandwidth utilization and making the system more cost effective and less traffic congestion prone.

IMPLEMENTATION OF SYSTEM MODEL
This experiment uses sets of PC memory distributed databases with java platform. To evaluate the implementation of algorithm, following parameters have been taken into consideration: Bandwidth utilization, Number of clusters, Number of messages to be sent individually, Number of messages sent as a composite message, number of checkpoints taken, number of messages to be recovered since this thesis is an attempt to develop a recovery system which may succeed in reducing the number of messages required to be recovered.

• Bandwidth Utilization Versus Number of clusters
In the proposed algorithm, effort has been focused to find the fact that whether the number of composite messages depend on the number of clusters? Now consider the given Figure 6.1: From Figure 6.1, it is obvious that with increase in number of clusters there is increase in number of composite messages but in a graceful way. Now let us see the advantage of this fact:

Less Number of Clusters:
If there are less number of clusters, than number of messages to be sent are almost equal to number of clusters. In case, the number of clusters sending the messages is less, the number of composite messages sent is also low and hence the bandwidth is used efficiently.

Average Number of clusters:
If there are average number of clusters, than number of messages sent are almost two third of the number of clusters. So, with increase in number of clusters, there is a little increase in number of composite messages and hence usage of bandwidth is still efficient.

Increased Number of Clusters:
If there is large number of sending clusters, the number of messages sent is almost half of the number of sending clusters. Hence usage of bandwidth is still efficient.
• Bandwidth Usage: As shown in the Figure 6.2, the bandwidth usage by the proposed technique is the least as compared to other techniques.

• Number of individual messages to be sent versus number of composite messages sent
In the proposed algorithm, if one or more processes in the sending cluster have to send messages to one or more processes at the receiving end, may be a cluster or a site, then the sending cluster first makes a composite message comprising of all the individual messages received from processes under it. This composite message is then sent by the sending cluster to the receiving cluster and after receiving this message, the receiving cluster multicasts the appropriate extracted messages to the receiving active processes. Figure 6.3 shows the comparison between numbers of actual messages to be sent versus number of composite messages sent. From the above figure, it is clear that during various checkpoints, the number of composite messages sent remain almost constant. And also, the number of composite messages sent are largely less than the actual individual messages to be sent, thus saving the actual bandwidth. Hence this graph clearly shows that the proposed algorithm has a caliber of improving the bandwidth usage.

• Number of messages to be recovered with increased number of clusters
As shown in the figure 6.4, it is clear that in the proposed technique, less number of messages need to be recovered than in the B. Gupta et.al method.

Figure 6.4 Messages recovered versus number of clusters
This is due to the fact that in proposed technique, initially a control message is sent to the receiving clusters from the sending cluster. In case, if the receiving cluster does not receive the control message in time, still it comes to know about the latest checkpoint taken when it receives the first application message embedded with latest SN sent to it by sending clusters, thus minimizing the chances of lost or orphan messages and hence, resulting in minimized recovery of messages. Moreover , no acknowledgement is sent back by the receiving cluster since even if it does not receive the control message, first application message sent to any one of its node, informs about the latest checkpoint taken and hence all the active processes in the cluster updates its synchronization number with the latest received SN.

3.CONCLUSIONS
Checkpointing protocols require the processes to take periodic checkpoints with varying degrees of coordination. At one end of the spectrum, coordinated checkpointing requires the processes to coordinate their checkpoints to form global consistent system states. Coordinated checkpointing generally simplifies recovery and garbage collection, and yields good performance in practice. At the other end of the spectrum, uncoordinated checkpointing does not require the processes to coordinate their checkpoints, but it suffers from potential domino effect, complicates recovery, and still requires coordination to perform output commit or garbage collection. Between these two ends are communication-induced checkpointing schemes that depend on the communication patterns of the applications to trigger checkpoints. These schemes do not suffer from the domino effect and do not require coordination. Recent studies, however, have shown that the nondeterministic nature of these protocols complicates garbage collection and degrades performance.
In this thesis, we have presented a simple non-blocking efficient and low cost check pointing algorithm for cluster federation. The time interval considered between successive invocations of algorithm ensures minimum number of lost or delayed messages. The main features of the algorithm are: 1) Minimum number of processes takes check points in this approach. 2) Cluster to cluster communication is minimum.3) Each cluster maintains its own data structures for keeping the check pointing dependency information resulting in decentralized approach and faster speed of execution. 4) Wastage of bandwidth is minimum

Future Scope
Message is not secure. Here message is travel in plain text form so work on security.
On peer to peer model it is implemented. It is used in share data base.