Project deliverable Open Access

D12.2 – Mid-term report on data integration

Bardi Alessia; Marberg Johan Fihn; Theodoridou Maria

Williams Stephanie
Baglioni Miriam; Casarosa Vittore; Millet Pablo; Ottonello Enrico

This deliverable describes the activities carried out and the results achieved during the first two years of the ARIADNEplus project within four tasks of Work Package 12 (WP12). The objectives are to develop, deliver and maintain the components of the ARIADNEplus infrastructure that support the integration and interoperability of the data provided by the members of the consortium. The catalogue data integrated by the ADI (the Aggregative Data Infrastructure developed in T12.2) are made available as RDF records compliant to the AO-Cat model to the ARIADNEplus portal (T12.3) and the pilots developed in WP16 via two services: (1) the ARIADNEplus AC (the data and knowledge cloud developed in T12.1) , which exposes a SPARQL API, and (2) an Elasticsearch server, which provides a full-text index of the content of the AC. The deeper integration of item level data (item-level integration) is investigated in task 12.4, in order to develop support for research questions that require information that is richer than what is available in AO-Cat. The design, development and deployment activities have been guided by the requirements of all the members of the consortium, especially those involved in WP4 and WP5. For the development of the new features of the portal, a Portal Working Group has been formed including technical and nontechnical members from SND, PIN, USW, CNR, ADS, and SRFG. By December 2020, WP12 delivered all the components and implemented the aggregation workflow devised in collaboration with WP5. The ADI includes services and tools required to perform data collection, transformation, and harmonisation: the 3M Editor (definition of the mappings from local metadata format to AO-Cat) developed and maintained by FORTH; the Vocabulary Matching Tool (definition of mappings from local subject terms to terms of Getty AAT) developed and maintained by USW; and the ARIADNEplus aggregator developed and maintained by CNR (the data aggregator is based on the D-Net software toolkit: it collects the providers’ XML records and integrates the X3ML toolkit for the execution of 3M mappings, and implements the aggregation workflows defined in collaboration with WP5). The AC includes a knowledge graph implemented with GraphDB (free edition) and one Springboot application that acts as mediator for the interactions among the aggregator, GraphDB, and Elasticsearch. Elasticsearch is used to provide a full-text index of the content available in the AC, to be used by the ARIADNEplus portal. The ARIADNEplus portal is developed using PHP, Vue.js, Javascript, Vuex, Tailwind, and Font Awesome. The portal provides standard free-text and faceted search options, but also advanced features based on the concepts of temporal, spatial, and topical coverage. In order to enable data curators to check the quality of data before it is made available on the public portal, WP12 set up a staging environment where the collected data is aggregated, added to a staging AC and indexed on a staging portal. Upon confirmation of data experts, data is then pushed to the production environment. The production environment makes the ARIADNEplus portal available to the public, while the staging environment is only available to the consortium members to check data quality and test new functionality of the portal.

