Project deliverable Open Access
Umerle, Tomasz; Błaszczyńska, Marta; Wnuk, Madgalena; Franczak, Mateusz; Stojanovski, Jadranka; Rosinski, Cezary; Wołczuk, Nikodem; Mikołajczyk-Bareła, Agnieszka; Karlińska, Agnieszka; Ogrodniczuk, Maciej; Pęzik, Piotr; Kramer, Bianca; Peroni, Silvio; De Santis, Luca; Balkan, Lorna; Inkret, Ana; Przysiecka, Karoline; Maryl, Maciej; Tóth-Cifra, Erszébet
This report focuses on the metadata as a specific type of research data in the humanities by analysing key metadata elements – persistent identifiers (PIDs), abstracts, keywords and citations. It defines those elements, outlines challenges for processing them in the humanities and presents the challenges for GoTriple as the metadata aggregator of this kind of research data.
The assumption is that GoTriple is a specific kind of research dataset on its own that can and will be reused by stakeholders such as other metadata aggregators, indexers,
publishers, information services (i.e. providers of scholarly metrics), but also scientists interested in data-driven research (cultural analytics, scientometrics, bibliometrics, etc.). This demands a good understanding of key metadata elements important to GoTriple's
aggregation and enrichment processes (abstracts, keywords) and their development (PIDs, citations).
Chapter 1 defines the aim of the deliverable, context of its creation and its audience.
Chapter 2 discusses the specificity of the research data in the humanities and this report’s position in the rich discussions on the topic.
Chapter 3 – dedicated to PIDs – presents the overview of the topic and the challenges related to the PID’s uptake by the humanities, such as the role of cultural heritage data for the humanities, importance of bibliodiversity and multilingualism (subchapter 3.1), then it
proceeds to the discussion of processing PIDs from GoTriple’s data providers by focusing on data dispersion and heterogeneity (subchapter 3.2).
Chapter 4 – dedicated to keywords – begins with the typology of keywords and the expected standards they should adhere to (subchapter 4.1). Subchapter 4.2 tackles the issue of automated generation of keywords and proposes different approaches applicable in the context of GoTriple. In the subchapter 4.3 a current approach to keyword organisation in GoTriple is presented, with focus on the GoTriple vocabulary that responds to the need for keywords LOD-ification and can be in the future reused for automated keyword generation.
Chapter 5 – dedicated to abstracts – starts with the comprehensive presentation of the abstract ecosystem, offering also a specific perspective on SSH. Subchapter 5.2 offers solutions to the issues of “missing abstracts” which are aimed at the needs of the GoTriple platform.
Chapter 6 – dedicated to citations – offers an overview of the topic and its relevance to the SSH. In the subchapter 6.2 an analysis of issues related to GoTriple’s expression of citation data is presented (that relates especially to the challenge of processing different citation formats and citation data quality).
Each chapter concludes with a summary of the guidelines for the specific metadata type for the humanities.
D8.5 Guidelines on the research data in the humanities (FINAL).pdf
|All versions||This version|
|Data volume||588.2 MB||447.8 MB|