Video/Audio Open Access
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.
P. Liu, S. Loudcher, J. Darmont, E. Perrin, J.P. Girard, M.O. Rousset, "Metadata model for an archeological data lake", Digital Humanities (DH 2020), Ottawa, Canada, July 2020 (https://dh2020.adho.org/).
P.N. Sawadogo, E. Scholly, C. Favre, E. Ferey, S. Loudcher, J. Darmont, "Metadata Systems for Data Lakes: Models and Features", 1st International Workshop on BI and Big Data Applications (BBIGAP@ADBIS 2019), Bled, Slovenia, September 2019; Communications in Computer and Information Science, Vol. 1064, Springer, Heidelberg, Germany, 440-451.
P.N. Sawadogo, T. Kibata, J. Darmont, "Metadata Management for Textual Documents in Data Lakes", 21st International Conference on Enterprise Information Systems (ICEIS 2019), Heraklion, Crete-Greece, May 2019, 72-83; INSTICC, Setúbal, Portugal (Vol. 1).
P.N. Sawadogo, J. Darmont, "On Data Lake Architectures and Metadata Management", Journal of Intelligent Information Systems, 2020