D5.2 First Report on Data Infrastructure update and extension

Richards Julian

doi:10.5281/zenodo.4922749

Published June 8, 2020 | Version v1

Project deliverable Open

D5.2 First Report on Data Infrastructure update and extension

Richards Julian¹

1. University of York: Archaeology Data Service

Contributors

Other:

Williams Stephanie³

Researcher (6):

1. University of South Wales
2. CNR-ISTI
3. PIN
4. FORTH

The deliverable reports the work done in Tasks 5.1, 5.2, 5.3, 5.4, 5.5 and 5.7 during the initial 18 months, i.e. period 1 of the project, assessing it and planning the related activities for the second period, i.e. months 19-36. Work done under Task 5.6 has previously been reported under D5.1. related activity is reported in D4.1 ARIADNEplus seeks to update and extend the research data infrastructure delivered within the preceding ARIADNE project (2013-17). It extends ARIADNE in several dimensions:

1. Wider geographical coverage with new partners.

2. Wider disciplinary coverage with a greater emphasis on the sub-domains of palaeoanthropology, bioarchaeology, environmental archaeology, material sciences, dating methods, and on the archaeology of standing structures.

3. The time span considered.

4. The depth of database integration, with a greater degree of item-level integration.

5. Greater integration of texts.

6. Broader audiences.

7. Greater range of services.

This deliverable describes the update procedures, which are being followed by partners, and introduces the steps in the aggregation pipeline. There are two options for aggregation: the standard approach using a suite of tools for the semi-automated aggregation of large data-sets, and a basic approach for the manual upload of small numbers of records. The majority of partners use the standard approach, which has been developed and tested on over 1m records from UoY-ADS, from a range of datasets. Aggregation proceeds according to an agreed priority list. ARIADNE subject types are agreed, and partners choose whether they will upload their data via XML files, or automated harvesting methods, such as OAI-PMH. Partners following the standard approach must:

1. Describe their data according to the AO-Cat using the 3M tool, usually with one mapping per partner

2. Map subject terms to the Getty AAT using the Vocabulary Matching Tool

3. Define any period terms used so that they are uploaded to Perio.do

Where temporal data needs cleaning to create consistent use of date ranges and periods, partners use an additional tool, Time Spans, to normalise date ranges. They must also ensure that spatial data is compliant with WGS 84. Partners using Fast Cat instead manually enter their data records in a spreadsheet-like tool, where the column headings already correspond to AO-Cat core mandatory fields, so that there can be a single mapping covering multiple partners. Data aggregated by both routes is then transformed into the ARIADNE triplestore, and is also used to create the indices used to power Elasticsearch in the ARIADNE portal. Data is initially loaded into a “ghost” portal for checking, before it is published. Progress has so far been monitored via a shared Googlesheet, which provides an aggregation dashboard, but during the next phase we will use a new software tool, Activity Dash (under implementation in WP14), which will make it easier to monitor the progress across a large number of partners. During the reporting period we have so far aggregated over 1.5m records covering archaeological sites and monuments and archaeological ‘events’ (i.e. excavation and other fieldwork activities) from a small number of partners: UoY-ADS, AIAC, HNM, DANS-KNAW, ARUP. During the remainder of 2020 our next priority will be to complete the aggregation of this type of data, and move onto a broader range of data types, including the development of application profiles for data types which extend the subject range of the ARIADNE infrastructure, and take us into item level aggregation. This was an experimental area in ARIADNE, but will be a priority for the development of VREs in ARIADNEplus during the second phase of the project. We will also extend the range of the ARIADNEplus data infrastructure to catalogue information about people, institutions and services. We will throughout ensure compatibility with other catalogues. We have done pilot work with EOSC in the TEXTCROWD application, and going forward we will ensure the visibility of ARIADNE resources within the EOSC Hub as that is implemented.

Notes

All ARIADNEplus deliverables are available at: https://ariadne-infrastructure.eu/resources/ariadneplus-deliverables/

Files