Video/Audio Open Access

Data Lakes for Digital Humanities

Darmont, Jérôme; Favre, Cécile; Loudcher, Sabine; Noûs, Camille

Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.

Files (129.3 MB)
Name Size
129.3 MB Download
  • P. Liu, S. Loudcher, J. Darmont, E. Perrin, J.P. Girard, M.O. Rousset, "Metadata model for an archeological data lake", Digital Humanities (DH 2020), Ottawa, Canada, July 2020 (

  • P.N. Sawadogo, E. Scholly, C. Favre, E. Ferey, S. Loudcher, J. Darmont, "Metadata Systems for Data Lakes: Models and Features", 1st International Workshop on BI and Big Data Applications (BBIGAP@ADBIS 2019), Bled, Slovenia, September 2019; Communications in Computer and Information Science, Vol. 1064, Springer, Heidelberg, Germany, 440-451.

  • P.N. Sawadogo, T. Kibata, J. Darmont, "Metadata Management for Textual Documents in Data Lakes", 21st International Conference on Enterprise Information Systems (ICEIS 2019), Heraklion, Crete-Greece, May 2019, 72-83; INSTICC, Setúbal, Portugal (Vol. 1).

  • P.N. Sawadogo, J. Darmont, "On Data Lake Architectures and Metadata Management", Journal of Intelligent Information Systems, 2020

All versions This version
Views 6565
Downloads 1,5511,551
Data volume 200.6 GB200.6 GB
Unique views 6161
Unique downloads 973973


Cite as