Published February 4, 2017 | Version 10006700
Journal article Open

Hierarchical Checkpoint Protocol in Data Grids

Description

Grid of computing nodes has emerged as a
representative means of connecting distributed computers or
resources scattered all over the world for the purpose of computing
and distributed storage. Since fault tolerance becomes complex due
to the availability of resources in decentralized grid environment,
it can be used in connection with replication in data grids. The
objective of our work is to present fault tolerance in data grids
with data replication-driven model based on clustering. The
performance of the protocol is evaluated with Omnet++ simulator.
The computational results show the efficiency of our protocol in
terms of recovery time and the number of process in rollbacks.

Files

10006700.pdf

Files (144.4 kB)

Name Size Download all
md5:aa5f39a41a0045bd78b5c459f2620de9
144.4 kB Preview Download

Additional details

References

  • O. Marin, "The darx framework: Adapting fault tolerance for agent systems," Ph.D. dissertation, Universit´e de Have, 2003.
  • B. Hamid, "Distributed fault-tolerance techniques for local computations," Ph.D. dissertation, Universit´e Bordeaux I, 2007.
  • F. Reichenbach, "Service snmp de dtection de faute pour des systmes rpartis," Ph.D. dissertation, Ecole polytechnique de Lausane, 2002.
  • M. Wiesmann, F. Pedone, and A. Schiper, "A systematic classification of replicated database protocols based on atomic broadcast," in 3rd Europeean Research Seminar on Advances in Distributed Systems, 1999.
  • X. Besseron, "Tol´erance aux fautes et reconfiguration dynamique pour les applications distribu´ees `a grande ´echelle," Ph.D. dissertation, Universit´e de Grenoble, 2010.
  • N. M. Ndiaye, "Techniques de gestion des d´e faillances dans les grilles informatiques tol´e rantes aux fautes," Ph.D. dissertation, Universit´e Pierre et Marie Curie, 2013.
  • S. Drapeau, "Un canevas adaptable de services de duplication," Ph.D. dissertation, Institut National Polytechnique de Grenoble, 2003.
  • R. Souli-Jbali, M. S. Hidri, and R. B. Ayed, "Dynamic data replication-driven model in data grids," in 39th Annual Computer Software and Applications Conference, COMPSAC Workshops 2015, Taichung, Taiwan, July 1-5, 2015, 2015, pp. 393–397.
  • Chandy and Lamport, "Distributed snapshots : Determining global states of distributed systems," ACM Transactions on Computer Systems, vol. 3, no. 1, pp. 63–75, 1985. [10] H. S.Paul, A. Gupta, and R. Badrinath, "Hierarchical coordinated checkpointing protocol," in International Conference on Parallel and Distributed Computing Systems, 2002, pp. 240–245. [11] K. Bhatia, K. Marzullo, and L. Alvisi, "Scalable causal message logging for wide-area environments," Concurrency and Computation: Practice and Experience, vol. 15, no. 3, pp. 243–250, 2003. [12] S. Monnet, C. Morin, and R. Badrinath, "Hybrid checkpointing for parallel applications in cluster federations," in 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, 2004, pp. 773–782. [13] E. Meneses, C. L. Mendes, and L. V. Kale, "Team based message logging : Preliminary results," in 4th IEEE ACM International Symposium on Cluster Computing and the Grid, 2010. [14] J.-M. Yang, K. Li, W.-W. Li, and D.-F. Zhang, "Trading off logging overhead and coordinating overhead to achieve efficient rollback recovery," Concurrency and Computation: Practice and Experience, vol. 21, no. 3, pp. 819–853, 2009. [15] A. Guermouche, "Nouveaux protocoles de tolrance aux fautes pour les applications du calcul haute performance," Ph.D. dissertation, Universit´e Paris-Sud, 2011. [16] D. B. Johnson and W. Zwaenepoel, "Sender based message logging," in The Seventeenth Annual International Symposium on Fault-Tolerant Computing, 1987, pp. 14–19. [17] A. Varga and R. Hornig, "An overview of the omnet++ simulation environment," in Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, 2008, pp. 60:1–60:10.