Published September 30, 2020 | Version v1
Video/Audio Open

Data Lakes for Digital Humanities

  • 1. Université de Lyon, Lyon 2, ERIC UR 3083
  • 2. Université de Lyon, Lyon 2, Laboratoire Cogitamus

Description

Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.

Files

ddh20-darmont-favre-loudcher-nous.mp4

Files (129.3 MB)

Name Size Download all
md5:c8cdb963c76b8f80787eed9c11f4ab5e
129.3 MB Preview Download

Additional details

Related works

Is derived from
Conference paper: 10.1145/3423603.3424004 (DOI)

References

  • P. Liu, S. Loudcher, J. Darmont, E. Perrin, J.P. Girard, M.O. Rousset, "Metadata model for an archeological data lake", Digital Humanities (DH 2020), Ottawa, Canada, July 2020 (https://dh2020.adho.org/).
  • P.N. Sawadogo, E. Scholly, C. Favre, E. Ferey, S. Loudcher, J. Darmont, "Metadata Systems for Data Lakes: Models and Features", 1st International Workshop on BI and Big Data Applications (BBIGAP@ADBIS 2019), Bled, Slovenia, September 2019; Communications in Computer and Information Science, Vol. 1064, Springer, Heidelberg, Germany, 440-451.
  • P.N. Sawadogo, T. Kibata, J. Darmont, "Metadata Management for Textual Documents in Data Lakes", 21st International Conference on Enterprise Information Systems (ICEIS 2019), Heraklion, Crete-Greece, May 2019, 72-83; INSTICC, Setúbal, Portugal (Vol. 1).
  • P.N. Sawadogo, J. Darmont, "On Data Lake Architectures and Metadata Management", Journal of Intelligent Information Systems, 2020