Published May 31, 2019 | Version v1
Journal article Open

Optimizing Data Reliability and Consistency in Hadoop Environments by Introducing ACID Capabilities

Authors/Creators

Description

The massive big data growth made the Hadoop platform with the distributive computing framework very popular. However, the Hadoop system has scalable storage and execution of large datasets that are cost-effective; but in fact, it lacks the strong data consistency guarantees associated with ACID (Atomicity, Consistency, Isolation, and Durability) data properties, which are normally found in the traditional RDBMS. This paper explores how the incorporation of ACID into the Hadoop ecosystem will have implications for data reliability and consistency and solve the issues that are experienced in big data applications. The proposed solution elaborates on the incorporation of ACID principles with the Hadoop environment, which involves developing strong transaction management, rigorous data isolation, and much more efficient failure recovery mechanisms. An analysis and conclusion drawn on the applicability of this technique for use cases such as financial analytics, healthcare data management, and supply chain optimization are covered. The paper follows the scope and future directions of acknowledged ACID-enabled Hadoop that will help it transform the landscape of big data by providing enterprise-class data integrity within a scalable and distributed computing platform.

Files

EJAET-6-5-73-78.pdf

Files (309.6 kB)

Name Size Download all
md5:36deedc4745dae0339dd7360b5343fcd
309.6 kB Preview Download

Additional details

References

  • [1]. Dean, J. and Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), pp.107-113.
  • [2]. Thompson, D., Henke, Z., Cox, K. and Fenton, K., 2015. Text Transformation.
  • [3]. Berenson, H., Bernstein, P., Gray, J., Melton, J., O'Neil, E. and O'Neil, P., 1995. A critique of ANSI SQL isolation levels. ACM SIGMOD Record, 24(2), pp.1-10.
  • [4]. Gilbert, S. and Lynch, N., 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News, 33(2), pp.51-59.
  • [5]. Pokorný, J., Škoda, P., Zelinka, I., Bednárek, D., Zavoral, F., Kruliš, M. and Šaloun, P., 2015. Big data movement: a challenge in data processing. Big Data in Complex Systems: Challenges and Opportunities, pp.29-69.
  • [6]. K. Grolinger et al., "HadoopACID: Integrating ACID Semantics into Hadoop," in Proceedings of the 2013 IEEE International Conference on Services Computing, 2013, pp. 306-313.
  • [7]. Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P. and Abadi, D.J., 2012, May. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 1-12).
  • [8]. Bhosale, H.S. and Gadekar, D.P., 2014. A review paper on big data and hadoop. International Journal of Scientific and Research Publications, 4(10), pp.1-7.)
  • [9]. Russom, P., 2016. Data Warehouse Modernization. TDWI Best Pract Rep.
  • [10]. Dange, M.S. and Sulaiman, S., A Comparative Study between Big Data Solutions HortonWorks, Cloudera and Microsoft Azure HD Insight.
  • [11]. Brandes, U., Reddy, C. and Tagarelli, A. eds., 2018. Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press.
  • [12]. Raghupathi, W. and Raghupathi, V., 2014. Big data analytics in healthcare: promise and potential. Health information science and systems, 2, pp.1-10.
  • [13]. Gunasekaran, A., Papadopoulos, T., Dubey, R., Wamba, S.F., Childe, S.J., Hazen, B. and Akter, S., 2017. Big data and predictive analytics for supply chain and organizational performance. Journal of Business Research, 70, pp.308-317.