Published April 30, 2023 | Version v1
Journal article Open

Cross-Domain Data Engineering: Challenges, Solutions, and Future Directions

Description

In an era characterized by the exponential growth of big data, the discipline of cross-domain data engineering has become increasingly vital, serving as the backbone for innovation and strategic decision-making across various sectors. This paper comprehensively explores the multifaceted challenges and opportunities within cross-domain data engineering, highlighting its critical role in harnessing the power of disparate data sources for enhanced analytical insights. The research delves into the complexities of integrating, processing, and analyzing data across diverse domains, addressing key challenges such as data heterogeneity, quality and consistency issues, scalability, and privacy and security concerns. This study offers a roadmap for overcoming the inherent obstacles in cross-domain data analysis by examining advanced technological solutions, including semantic web technologies, data virtualization, and machine learning for data integration. Furthermore, the paper discusses the evolving landscape of privacy regulations and the importance of robust data governance frameworks. By leveraging case studies and outlining future directions, this research contributes to the ongoing discourse on cross-domain data engineering, providing valuable insights and guidance for researchers and practitioners aiming to navigate the complexities of this field. Ultimately, the paper underscores the significance of cross-domain data engineering in unlocking the full potential of big data, paving the way for innovative solutions and strategic advantages in the competitive global market.

Files

EJAET-10-4-63-70.pdf

Files (174.5 kB)

Name Size Download all
md5:879d31a9a174327325b380309e96ba22
174.5 kB Preview Download

Additional details

References

  • [1]. A. Mavrogiorgou et al., "A Cross-domain Data Marketplace for Data Sharing," ACM International Conference Proceeding Series, pp. 72–79, Oct. 2022, doi: 10.1145/3571697.3571707.
  • [2]. H. Hu, H. Wang, and B. Zheng, "Challenges in Managing and Mining Large, Heterogeneous Data," LNCS, vol. 6588, pp. 462–462, 2011, doi: 10.1007/978-3-642-20152-3_40.
  • [3]. S. McClean, B. Scotney, and M. Shapcott, "Using Domain Knowledge to Learn from Heterogeneous Distributed Databases," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3213, pp. 171–177, 2004, doi: 10.1007/978-3-540-30132-5_28/COVER.
  • [4]. A. Kadadi, R. Agrawal, C. Nyamful, and R. Atiq, "Challenges of data integration and interoperability in big data," Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014, pp. 38–40, 2014, doi: 10.1109/BIGDATA.2014.7004486.
  • [5]. P. Christen, D. Vatsalan, and V. S. Verykios, "Challenges for privacy preservation in data integration," Journal of Data and Information Quality (JDIQ), vol. 5, no. 1–2, Sep. 2014, doi: 10.1145/2629604.
  • [6]. R. H. Khokhar, B. C. M. Fung, F. Iqbal, D. Alhadidi, and J. Bentahar, "Privacy-preserving data mashup model for trading person-specific information," Electron Commer Res Appl, vol. 17, pp. 19–37, May 2016, doi: 10.1016/J.ELERAP.2016.02.004.
  • [7]. M. Sooriyabandara et al., "Unified Link Layer API: A generic and open API to manage wireless media access," Comput Commun, vol. 31, no. 5, pp. 962–979, Mar. 2008, doi: 10.1016/J.COMCOM.2007.12.025.
  • [8]. D. Yang, L. Li, and L. Sun, "Layered Graph Data Model for dataspaces management," 2011 IEEE 3rd International Conference on Communication Software and Networks, ICCSN 2011, pp. 234–237, 2011, doi: 10.1109/ICCSN.2011.6014430.
  • [9]. X. Wang, R. Feng, W. Dong, X. Zhu, and W. Wang, "Unified access layer with PostgreSQL FDW for heterogeneous databases," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10578 LNCS, pp. 131–135, 2017, doi: 10.1007/978-3-319-68210-5_14/FIGURES/5.
  • [10]. S. C. Y. Lu, "Machine learning approaches to knowledge synthesis and integration tasks for advanced engineering automation," Comput Ind, vol. 15, no. 1–2, pp. 105–120, Jan. 1990, doi: 10.1016/0166-3615(90)90088-7.
  • [11]. M. Birgersson, G. Hansson, and U. Franke, "Data Integration Using Machine Learning," Proceedings - IEEE International Enterprise Distributed Object Computing Workshop, EDOCW, vol. 2016-September, pp. 313–322, Sep. 2016, doi: 10.1109/EDOCW.2016.7584357.
  • [12]. S. Amershi et al., "Software Engineering for Machine Learning: A Case Study," Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019, pp. 291–300, May 2019, doi: 10.1109/ICSE-SEIP.2019.00042.
  • [13]. X. L. Dong and T. Rekatsinas, "Data integration and machine learning," Proceedings of the VLDB Endowment, vol. 11, no. 12, pp. 2094–2097, Aug. 2018, doi: 10.14778/3229863.3229876.
  • [14]. Y. Li and A. Ngom, "Data integration in machine learning," Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015, pp. 1665–1671, Dec. 2015, doi: 10.1109/BIBM.2015.7359925.
  • [15]. C. Deng, X. Ji, C. Rainey, J. Zhang, and W. Lu, "Integrating Machine Learning with Human Knowledge," iScience, vol. 23, no. 11, Nov. 2020, doi: 10.1016/J.ISCI.2020.101656.
  • [16]. M. Picard, M. P. Scott-Boyer, A. Bodein, O. Périn, and A. Droit, "Integration strategies of multi-omics data for machine learning analysis," Comput Struct Biotechnol J, vol. 19, pp. 3735–3746, Jan. 2021, doi: 10.1016/J.CSBJ.2021.06.030.
  • [17]. A. Nazir, "SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG DATA ANALYTICS," International Journal of Distributed and Parallel Systems (IJDPS, vol. 8, no. 3, 2017, doi: 10.5121/ijdps.2017.8301.
  • [18]. D. Wang et al., "Human-AI Collaboration in Data Science," Proc ACM Hum Comput Interact, vol. 3, no. CSCW, Nov. 2019, doi: 10.1145/3359313.
  • [19]. Y. ting Zhuang, F. Wu, C. Chen, and Y. he Pan, "Challenges and opportunities: from big data to knowledge in AI 2.0," Frontiers of Information Technology and Electronic Engineering, vol. 18, no. 1, pp. 3–14, Jan. 2017, doi: 10.1631/FITEE.1601883/FIGURES/2.
  • [20]. A. Y. Levy, "Combining Artificial Intelligence and Databases for Data Integration," pp. 249–268, 1999, doi: 10.1007/3-540-48317-9_10.