D4.1 Initial report on dataset integration
This deliverable reports the work done within WP4, comprising Tasks 4.1, 4.2, 4.3, 4.4 and 4.5 during the initial 18 months, i.e. period 1 of the project, assessing it and planning the related activities for the second period, i.e. months 19-36. The document is structured according to the activities carried out by the work package. After an overall presentation of the main aims of WP4 in Chapter 2, Chapter 3 provides an overview of the conceptual model being developed in ARIADNEplus and its various components. It focuses on the new AO-Cat model, designed to describe the datasets in the Catalogue and improves the previous ACDM model developed in ARIADNE. The full AO-Cat ontology is provided in the Appendix of this deliverable. The chapter also presents the fundamental categories defined to classify the information in the Catalogue and make it easily available on the new ARIADNEplus Portal, according to the recommendations of the FAIR principles which inspired the model. An overview of the application profiles under development in WP14 along with a review of the compatible models already in use by some partners. Chapter 4 presents the tools developed and implemented in ARIADNEplus to facilitate the encoding process of metadata according to the AO-Cat model and to assist content providers in all phases of their ingestion, mapping, transformation, enrichment and publication activities. Of great importance in this sense is the 3M Mapping Tool developed by FORTH, which allows users to define in detail the correspondences between the legacy metadata schemas and the entities of the model, in order to implement an optimal level of integration. The Fast Cat tool, designed for the rapid acquisition of information directly in the AO-Cat format, is also offered to partners who do not use any format for their metadata or who have a limited amount of data to be provided. Chapter 5 presents the Helpdesk, a collaborative service provided by the ARIADNEplus platform to assist content providers in all phases of the data contribution and to provide assistance at every stage of the process, from preparation to the definition of mappings, up to the fine-tuning the data harvesting and data acquisition mechanisms in the ARIADNEplus infrastructure. The service is based on the ticketing system and offers efficient interaction with the special team of experts set up to provide all the necessary information to foster the process. Chapter 6 documents the status of the integration, shows how the partners are adapting their data to the ARIADNEplus standards and the priorities defined for ingestion, according to the progress of these operations. The encoding, enrichment and standardisation work also relies on the use of the various vocabularies adopted by ARIADNEplus (e.g., the Getty AAT for subjects and PeriodO for time periods) and documented in the deliverable D5.2. Particular attention is paid to the mapping operations and a detailed analysis of the progress made on these activities is provided for each partner and each discipline listed in Task 4.4. Chapter 7 presents the activities aimed at linking the ARIADNEplus Data Infrastructure with repositories of scientific publications, exploiting, in particular, OpenAire and its open access digital archives and the links to individual journals such as Internet Archaeology or A&C. The chapter also describes the use of the ARIADNEplus text mining service (Task 17.4) to improve the metadata for the textual resources. The conclusions and an evaluation of the activities carried out in the first 18 months of the project are presented in chapter 8. The same section presents the strategies proposed for future work and for the completion of WP4 activities.