Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

Romein, C. Annemieke; Hodel, Tobias; Gordijn, Femke; Zundert, Joris J. van; Chagué, Alix; Lange, Milan van; Jensen, Helle Strandgaard; Stauder, Andy; Purcell, Jake; Terras, Melissa M.; Heuvel, Pauline van den; Keijzer, Carlijn; Rabus, Achim; Sitaram, Chantal; Bhatia, Aakriti; Depuydt, Katrien; Afolabi-Adeolu,  Mary Aderonke; Anikina, Anastasiia; Bastianello, Elisa; Benzinger, Lukas Vincent; Bosse, Arno; Brown, David; Charlton, Ash; Dannevig, André Nilsson; Gelder, Klaas van; Go, Sabine C.P.J.; Goh, Marcus J.C.; Gstrein, Silvia; Hasan, Sewa; Heide, Stefan von der; Hindermann, Maximilian; Huff, Dorothee; Huysman, Ineke; Idris, Ali; Keijzer, Liesbeth; Kemper, Simon; Koenders, Sanne; Kuijpers, Erika; Rønsig Larsen, Lisette; Lepa, Sven; Link, Tommy O.; Nispen, Annelies van; Nockels, Joe; Noort, Laura M. van; Oosterhuis, Joost Johannes; Popken, Vivien; Estrella Puertollano, María; Puusaag, Joosep J.; Sheta, Ahmed; Stoop, Lex; Strutzenbladh, Ebba; Sijs, Nicoline van der; Spek, Jan Paul van der; Trouw, Barry Benaissa; Van Synghel, Geertrui; Vučković, Vladimir; Wilbrink, Heleen; Weiss, Sonia; Wrisley, David Joseph; Zweistra, Riet

doi:10.5281/zenodo.7267245

Published November 30, 2022 | Version v1

Journal article Open

Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

1. Huygens Institute for the History and Culture of the Netherlands; Vrije Universiteit Amsterdam
2. University of Bern
3. Huygens Institute for the History and Culture of the Netherlands
4. ALMAnaCH, Inria, Paris; Université de Montréal
5. NIOD Institute for War, Holocaust, and Genocide Studies
6. Aarhus Universitet/ Aarhus University
7. READ-COOP SCE
8. American Historical Association
9. University of Edinburgh
10. Amsterdam City Archives
11. Albert-Ludwigs-Universität: Freiburg im Breisgau
12. KNAW Humanities Cluster Amsterdam
13. Instituut voor de Nederlandse Taal
14. Bonn Center for Dependency and Slavery Studies at the University of Bonn
15. Universiteit van Amsterdam
16. Bibliotheca Hertziana – Max Planck Institute for Art History
17. Vrije Universiteit Amsterdam
18. KNAW Humanities Cluster, Amsterdam
19. Trinity College Dublin
20. University of Edinburgh; National Library of Scotland
21. National Archives of Norway
22. Vrije Universiteit Brussel; State Archives Brussels
23. University of Innsbruck; State Library of Tyrol
24. CCS Content Conversion Specialists GmbH
25. University of Basel
26. University Library of Tübingen
27. Dutch National Archives
28. Danish National Archives
29. Rahvusarhiiv Estonia
30. University of Amsterdam
31. Research Centre for Hanse and Baltic History (FGHO)
32. Friedrich Alexander Universität Erlangen-Nürnberg
33. independent citizen scientist
34. University of Aberdeen
35. Utrechts Archief
36. NYU Abu Dhabi

This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.

Files

Exploring_Data_Provenance.pdf

Files (1.8 MB)

Name	Size	Download all
Exploring_Data_Provenance.pdf md5:c8e011b32c20c3d61a9ef1f28ac35a56	1.8 MB	Preview Download

	All versions	This version
Views	2,931	1,991
Downloads	3,380	1,000
Data volume	12.5 GB	1.9 GB

Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done

Creators

Description

Files

Exploring_Data_Provenance.pdf

Files (1.8 MB)