Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done
Creators
- Romein, C. Annemieke1
- Hodel, Tobias2
- Gordijn, Femke3
- Zundert, Joris J. van3
- Chagué, Alix4
- Lange, Milan van5
- Jensen, Helle Strandgaard6
- Stauder, Andy7
- Purcell, Jake8
- Terras, Melissa M.9
- Heuvel, Pauline van den10
- Keijzer, Carlijn5
- Rabus, Achim11
- Sitaram, Chantal12
- Bhatia, Aakriti12
- Depuydt, Katrien13
- Afolabi-Adeolu, Mary Aderonke14
- Anikina, Anastasiia15
- Bastianello, Elisa16
- Benzinger, Lukas Vincent17
- Bosse, Arno18
- Brown, David19
- Charlton, Ash20
- Dannevig, André Nilsson21
- Gelder, Klaas van22
- Go, Sabine C.P.J.17
- Goh, Marcus J.C.17
- Gstrein, Silvia23
- Hasan, Sewa17
- Heide, Stefan von der24
- Hindermann, Maximilian25
- Huff, Dorothee26
- Huysman, Ineke3
- Idris, Ali17
- Keijzer, Liesbeth27
- Kemper, Simon27
- Koenders, Sanne17
- Kuijpers, Erika17
- Rønsig Larsen, Lisette28
- Lepa, Sven29
- Link, Tommy O.17
- Nispen, Annelies van5
- Nockels, Joe20
- Noort, Laura M. van17
- Oosterhuis, Joost Johannes30
- Popken, Vivien31
- Estrella Puertollano, María17
- Puusaag, Joosep J.17
- Sheta, Ahmed32
- Stoop, Lex33
- Strutzenbladh, Ebba34
- Sijs, Nicoline van der13
- Spek, Jan Paul van der33
- Trouw, Barry Benaissa33
- Van Synghel, Geertrui3
- Vučković, Vladimir17
- Wilbrink, Heleen35
- Weiss, Sonia7
- Wrisley, David Joseph36
- Zweistra, Riet33
- 1. Huygens Institute for the History and Culture of the Netherlands; Vrije Universiteit Amsterdam
- 2. University of Bern
- 3. Huygens Institute for the History and Culture of the Netherlands
- 4. ALMAnaCH, Inria, Paris; Université de Montréal
- 5. NIOD Institute for War, Holocaust, and Genocide Studies
- 6. Aarhus Universitet/ Aarhus University
- 7. READ-COOP SCE
- 8. American Historical Association
- 9. University of Edinburgh
- 10. Amsterdam City Archives
- 11. Albert-Ludwigs-Universität: Freiburg im Breisgau
- 12. KNAW Humanities Cluster Amsterdam
- 13. Instituut voor de Nederlandse Taal
- 14. Bonn Center for Dependency and Slavery Studies at the University of Bonn
- 15. Universiteit van Amsterdam
- 16. Bibliotheca Hertziana – Max Planck Institute for Art History
- 17. Vrije Universiteit Amsterdam
- 18. KNAW Humanities Cluster, Amsterdam
- 19. Trinity College Dublin
- 20. University of Edinburgh; National Library of Scotland
- 21. National Archives of Norway
- 22. Vrije Universiteit Brussel; State Archives Brussels
- 23. University of Innsbruck; State Library of Tyrol
- 24. CCS Content Conversion Specialists GmbH
- 25. University of Basel
- 26. University Library of Tübingen
- 27. Dutch National Archives
- 28. Danish National Archives
- 29. Rahvusarhiiv Estonia
- 30. University of Amsterdam
- 31. Research Centre for Hanse and Baltic History (FGHO)
- 32. Friedrich Alexander Universität Erlangen-Nürnberg
- 33. independent citizen scientist
- 34. University of Aberdeen
- 35. Utrechts Archief
- 36. NYU Abu Dhabi
Description
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
Files
Exploring_Data_Provenance.pdf
Files
(1.8 MB)
Name | Size | Download all |
---|---|---|
md5:c8e011b32c20c3d61a9ef1f28ac35a56
|
1.8 MB | Preview Download |