Project deliverable Open Access
Broeder, Daan; Trippel, Thorsten; Degl'Innocenti, Emiliano; Giacomi, Roberta; Sanesi, Maurizio; Kleemola, Mari; Moilanen, Katja; Ala-Lahti, Henri; Jordan, Caspar; Alfredsson, Iris; L'Hours, Hervé; Ďurčo, Matej
The SSHOC project aims to build a Social Sciences and Humanities Open Cloud (SSHOC) as part of the European Open Science Cloud (EOSC) by implementing a cloud-based infrastructure. The goal is to provide a recognisable and accessible environment for data, tools, services and trainings, and to maximise data reuse through Open Science and FAIR principles.
In line with these goals, this report provides an inventory of data formats and metadata standards that are currently used and relevant for the research infrastructures currently managed by the SSHOC main stakeholders, recommendations of specific formats and standards for increasing interoperability, and prioritisations for providing conversion services and planning solutions.
In the context of this deliverable, selected experts from the SSHOC stakeholder's infrastructures were interviewed about research data and metadata use in their respective infrastructures. Special attention was paid to interoperability aspects.
The interviews and the desk research indicated both the diversity of the SSHOC communities and the diversity of metadata standards used and needed. Therefore, this deliverable recommends a variety of metadata standards and data formats. The recommended metadata standards include domain specific metadata standards for each domain but also Dublin Core and relaxed DataCite for all domains.
The diversity of SSHOC communities is also shown in their data: there is a large selection of different media types and an enormous selection of data formats. The recommended data formats include a small selection of data formats for some general media types e.g. images, text annotations. The format for controlled vocabularies was also examined and SKOS was selected as the recommended format. It is worth noting that hardly any recommendation can fulfill all the use cases, so other metadata standards, data formats and formats for controlled vocabularies may still be used when necessary.
The priorities for providing conversion services and planning solutions were decided based on the interviews and on the needs from other SSHOC tasks. The challenges and solutions are analysed for each SSHOC community and both the priority and the needed actions are specified. We expect the prioritisation to change in the course of the project with the appearance of new requirements.
Beyond WP3, this document is relevant to WP4, WP5, WP7 and WP9. Ongoing discussion between the work packages and tasks is needed.
The authors of this report wish to thank the interviewed informants for their time and valuable contribution.
D3.1 Report on SSHOC (meta)data interoperability problems (approved 18Nov2019).pdf
Bargmeyer, Bruce E. & Daniel W. Gillman (2000). Metadata Standards and Metadata Registries: An Overview. https://pdfs.semanticscholar.org/05d6/a22f0ea1da685166787ebed63a44b0ddeac8.pdf [28.6.2019]
Beall, Jeffrey (2007). Discrete Criteria for Selecting and Comparing Metadata Schemes. Against the Grain: Vol. 19: Iss. 1, Article 7. DOI: https://doi.org/10.7771/2380-176X.5228 [28.06.2019]
Bruce, Thomas R. & Hillmann, Diane I. (2004). The Continuum of Metadata Quality: Defining, Expressing, Exploiting. In Metadata in Practice, D. Hillmann & E. Westbrooks, eds. https://hdl.handle.net/1813/7895 [28.06.2019]
Cambridge Dictionary (2019). Standard. https://dictionary.cambridge.org/dictionary/english/standard [14.5.2019]
DASISH (2014). Deliverable: D5.2A & D5.2B: Part A: Metadata Quality Improvement and Part B: Portal Progress report. https://dasish.eu/publications/projectreports/DASISH-D5.2_AB_final__25nov-R.PDF [7.6.2019]
DataCite Metadata Schema (2019). https://schema.datacite.org/ [5.6.2019]
Data Without Boundaries (2014). Integrated deliverable D7.2 - D7.3. Standards with future relevance for European Social Science data infrastructure. - Needs, Key Areas, Rules & Best Practices in Metadata Standard selection and usage. http://www.dwbproject.org/export/sites/default/about/public_deliveraples/dwb_d7-2_7-3_future- metadata-standards-usage-selection_integrated-report.pdf [7.6.2019]
DDI Alliance (2015). DDI Timeline. http://www.ddialliance.org/system/files/DDI%20Timeline%20With%20Foundational%20Events-For- Website.pdf [7.6.2019]
DDI Alliance (2017). Relationship to other Standards. Dublin Core and MARC https://ddi-lifecycle-3-2- documentation.readthedocs.io/en/latest/otherstandards/dublincore.html [7.6.2019]
DDI Alliance (2018a). Controlled Vocabularies - Overview Table of Latest Versions. http://www.ddialliance.org/controlled-vocabularies [18.6.2019]
DDI Alliance (2018b). Mapping to Dublin Core (DDI Version 2). https://www.ddialliance.org/resources/ddi- profiles/dc [28.06.2019]
Drude, Sebastian; Sara di Giorgio; Paola Ronzino; Petra Links; Annelies van Nispen; Karolien Verbrugge; Emiliano Degl'Innocenti; Jenny Oltersdorf; Juliane Stiller & Claus Spiecker (2016). PARTHENOS D2.1 Report on User Requirements. https://doi.org/10.5281/zenodo.2204560 [28.06.2019]
Dublin Core Metadata Initiative: Creating Metadata (2019). http://dublincore.org/resources/userguide/creating_metadata/ [5.6.2019]
Hider, Philip (2012). Information Resource Description. London: Facet Publishing.
Hollander, Hella; Francesca Morselli; Femmy Admiraal; Anders Conrad; Thorsten Trippel; Douwe Zeldenrust; Paola Ronzino; Sara Di Giorgio; Antonio Davide Madonna & Mark Hedges (2017). PARTHENOS D3.1 Guidelines for Common Policies Implementation (1). https://doi.org/10.5281/zenodo.2668392 [28.06.2019]
Kimball, Ralph; Margy Ross; Warren Thornthwaite; Joy Mundy & Bob Becker (2008): The Data warehouse Lifecycle Toolkit, Second Edition. Indianapolis, Indiana: Wiley publishing.
Leonelli, Sabina (2015). "What Counts as Scientific Data? A Relational Framework." Philosophy of science vol. 82, 5: 810-821. https://doi.org/10.1086/684083 [28.06.2019]
Neiswender, Caryn (2009). What is a Controlled Vocabulary? In The MMI Guides: Navigating the World of Marine Metadata. http://uop.whoi.edu/techdocs/presentations/MMI_Guides.pdf [15.5.2019]
OpenAIRE Guidelines for Data Archives (2015). https://guidelines.openaire.eu/en/latest/data/index.html [5.6.2019]
OpenAIRE: Use of DataCite (2015). https://guidelines.openaire.eu/en/latest/data/use_of_datacite.html [5.6.2019]
Pomerantz, Jeffrey (2015). Metadata (MIT Press Essential Knowledge series). Cambridge, Massachusetts: The MIT Press.
SKOS Simple Knowledge Organization System Reference (2009). W3C Recommendation 18 August 2009. https://www.w3.org/TR/skos-reference/ [18.6.2019]
The Open Archives Initiative Protocol for Metadata Harvesting (2015). https://www.openarchives.org/OAI/openarchivesprotocol.html [5.6.2019]
Van Uytvanck Dieter; Twan Goosen & Menzo Windhouwer (2012). CMDI and Granularity. https://www.clarin.eu/sites/default/files/AP3-007-CMDI_and_granularity.pdf [17.6.2019
Van Uytvanck DIeter; Herman Stehouwer & Lari Lampen. Semantic metadata mapping in practice: the Virtual Language Observatory. http://hdl.handle.net/11858/00-001M-0000-000F-85F1-Bt [4.6.2019]
Wilkinson, Mark D; Michel Dumontier; IJsbrand Jan Aalbersberg et al. (20169. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data vol. 3, Article number: 160018. https://doi.org/10.1038/sdata.2016.18 [14.6.2019]
Zhu, Hongwei. & Harris Wu (2011). Quality of data standards: framework and illustration using XBRL taxonomy and instances. Electronic Markets (2011) 21: 129. https://doi.org/10.1007/s12525-011-0060-4 [28.06.2019]
Zinn, Claus; Thorsten Trippel; Steve Kaminski & Emanuel Dima (2016). Crosswalking from CMDI to Dublin Core and MARC 21. Conference Paper. Tenth International Conference on Language Resources and Evaluation (LREC 2016). http://www.lrec-conf.org/proceedings/lrec2016/pdf/543_Paper.pdf [7.6.2019]
Zins, Chaim (2007). Conceptual approaches for defining data, information, and knowledge. Journal of the American Society for Information Science and Technology vol. 58, 4: 479-493. https://onlinelibrary.wiley.com/doi/10.1002/asi.20508