Published November 26, 2010 | Version v1
Project deliverable Open

CARARE - D2.5 White paper on CARARE technical approach (Final)

  • 1. KUAS

Description

Executive Summary

This deliverable describes in detail the overall technical architecture of the CARARE project. The goal of the project is to harvest content regarding archaeology and architecture (mostly monuments) from a number of content providers, and deliver it to Europeana. The technical architecture will be implemented and supported by the two technical partners of the project: National Technical University of Athens (NTUA) and Digital Curation Unit of the Athena Research Centre (DCU). The architecture specifies a three stage process, described below.

CARARE content is heterogeneous, as each content provider uses their own schema and catalogues information in different ways, has different coordinate systems to denote location, etc. Thus, the first step was to create a rich and powerful schema (described in another deliverable (D2.2), and known as the CARARE schema) that can encompass most of the information from all the content providers while balancing its complexity. All the individual schemas from all the content providers will be mapped to the CARARE schema and then all their data will be harvested and the actual mapping will take place. This first stage of the process will be handled by the NTUA’s mapping tool.

The second stage involves the semantic enrichment of the content; in order to accomplish that, content must be ingested into a repository which will provide the necessary enrichment services. This repository and relevant services will be implemented and managed by the DCU. The enrichment process involves several distinct functions: checking for content quality and notifying providers whose content needs enrichment; adding semantic relations between items among different collections (and content providers); handling geographic coordinates (normalizing different coordinate systems, finding items whose proximity is below a certain threshold); previewing the content, etc. The repository will handle the whole enrichment process by maintaining all changes (versioning) and conforming to established preservation standards.

The third and final stage of the process is the delivery of the transformed content to Europeana in the appropriate format (currently EDM v5.2). For this, a transformation from the CARARE schema to EDM is needed; a specification has been made available through another deliverable (D2.2).

The CARARE system consists of two main systems: the mapping tool and the repository. A crucial point is communication and information flow between these two systems. This has been given special attention and is described in detail in this document. A communication protocol has been designed especially for that; it uses REST based web services for communication, while information is packaged in special packages (submission information packages or SIPs) that focus on information preservation.

Files

carare_d2_5_technical_approach_final_.pdf

Files (515.7 kB)

Name Size Download all
md5:ce61b0a0ade7c5c7c41421c4dac34134
515.7 kB Preview Download