Semantic Representation and Enrichment of Cultural Heritage Information for Fostering Reinterpretation and Reﬂection on the European History

. The modern advances of digital technologies provide a wider access to information, enabling new ways of interacting with and understanding cultural heritage information, facilitating its presentation, access and reinterpretation. The paper presents a working example of connecting and mapping cultural heritage information and data from cultural heritage institutions and venues through the open technological platform of the CrossCult project. The process of semantically representing and enriching the available cultural heritage data is discussed, and the challenges of semantically expressing interrelations and groupings among physical items, venues, digital resources, and ideas are revealed. The paper also highlights the challenges in the creation of a knowledge base resource which aggregates a set of Knowledge Organization Systems (KOS): a carefully selected subset of the CIDOC Conceptual Reference Model, a set of application ontologies and an optimised classiﬁcation scheme based on domain vocabularies.


Introduction
Semantic Web technologies can ease access to Cultural Heritage content, facilitating new ways of engaging with heritage for the general public and experts that go beyond a simple interactive engagement. They enable capturing and describing the meaning and the connections among data, allowing an intelligent integration of resources via machine readable and human interpretable representations of domain knowledge that enables retrieval, reasoning, optimal data integration and knowledge reuse of disparate cultural heritage resources [1]. The benefits of Semantic Web technologies to Cultural Heritage include harmonised view to disparate and distributed contents, semantic-based content aggregation, search, browsing and recommendation [2,3].
The CrossCult 1 Project, taking advantage of the advances of digital technologies, particularly focused on the aspects of interactivity, recollection, and reflection, aims to demonstrate new ways for European citizens to appraise History. By facilitating interconnections between different pieces of cultural heritage information, public view points and physical venues, the project aims to move beyond the siloed presentation of historical data and foster the re-interpretation of history as we know it. Such connections allow reflection and reinterpretation of historical and societal views to be triggered. The project employs four flagship pilot cases, which are used to demonstrate how augmentation, data linking, semantic-based reasoning and retrieval across diverse cultural heritage resources can be achieved and contribute to its history reflection and re-interpretation aims.
This paper outlines the role of standard conceptual models for mediating semantic interoperability and discusses the role of Reflective Topics as a conceptual vehicle for fostering cross-border perspectives and reinterpretation of European history. Further sections present the rationale of the modelling choices for addressing the semantic requirements of the project in terms of facilitating interconnections among digital resources, and discuss the implementation pathway leading to the definition of the CrossCult Knowledge Base. The paper concludes with a discussion on the benefits of the adopted method and the future steps towards a greater application of semantic and knowledge representation technologies in cultural heritage.

Background
There is an abundance of tools for managing and semantically modelling cultural heritage data, such as the Dublin Core (DC) 2 Metadata Elements and DC Terms, the Simple Knowledge Organization System (SKOS) 3 , the Functional Requirements for Bibliographic Records (FRBR) 4 , the Europeana Data Model (EDM) 5 , the CIDOC-CRM 6 , the MIDAS Heritage 7 standard, the Lightweight Information Describing Objects 8 (LIDO) and the VRA Core 9 . These have been employed by numerous projects with varying degrees of success to aggregate and harmonise access to content across cultural heritage resources [4]. Among them, CIDOC-CRM, the Conceptual Reference Model (CRM) of the International Committee for Documentation (CIDOC) of the International Council of Museums, has become a well-established ISO standard (ISO 21127:2006) for modelling cultural heritage information, due to its ability to handle the variability and complexity of cultural heritage data [5]. It provides an extensible semantic framework that any cultural heritage information can be mapped to, based on real world concepts and events for modelling data with respect to empirically surfaced arrangements rather than artificial generalisations and fixed field schemas [6]. The aptness of CIDOC-CRM in modelling cultural heritage data is evident by several largescale projects that integrate vast datasets of classical antiquity, museum exhibits and archaeological research, such as, the Oxford University CLAROS project [7], the British Museum ResearchSpace 10 and the EU FP7 Ariadne Infrastructure [8].
Making use of such technologies, Semantic Web portals provide a range of usercentred services, enabling information seeking activities of serendipitous and relational search, personalisation and context awareness. But most importantly they are extendible to new types of information and new functionalities. Such portals are also very attractive from a publisher's/data provider's perspective by facilitating the distributed creation and maintenance of links and content, which significantly benefits reusability, enrichment and intelligent content aggregation. Semantic Web portals support user experiences that revolve around an orthogonal access of information, through conventional search and browsing activities with respect to the semantics (classes and attributes) of a conceptual data model.
A fundamental aim of CrossCult is to unleash the user experience from the conventional keyword search and hyperlink-based browsing of cultural heritage content by realising the advances of Semantic Web technologies in order to facilitate interconnections between pieces of cultural heritage information, public view points and physical venues. To this aim, the project integrates innovations from the intersection of Humanities with Computer Science in order to trigger substantial reflection on history as we know it, focusing on aspects that are cross-cultural and cross-border, as well as on grand societal challenges, such as population movements, access to health services, women's place in society, power structures and others. It is a multidisciplinary research endeavour between historians, archaeologists, information scientists and software engineers, seeking innovative experiences of engagement with cultural heritage that stimulate reflection and help European citizens appreciate their past and present history. By exploiting the abilities of Semantic Web technology, the project establishes interconnections among gallery items, museum exhibits, archaeological sites and urban spaces (POIs), aimed at fostering cross-border perspectives and a holistic understanding and reinterpretation of European history from multiple points of view.
To this aim the role of the Reflective Topic -a topic that people reflect on stimulated via groupings or narratives that link together different cultural heritage resources or POIs -becomes indispensable. The notion of Reflective Topic encompasses all those conceptual connections that can be made to create a network of points of view, aiding reflection and prospective interpretation over a historical topic. Such narratives can captivate user engagement and create long-lasting experiences based on interconnections among existing digital historical resources and by creating new ones through the participation of the public.

Methodological Approach
The CRM ontology provides a set of elements, which capture generic concepts related to the Cultural Heritage domain. The representation of domain-specific or applicationspecific concepts is possible via the instantiation of the E55 Type class, which enables connection to categorical knowledge commonly found in cultural documentation. In CrossCult, we adopted a common data modelling methodology, which consists of modelling the available knowledge via the standard CIDOC-CRM classes and further defining project-specific concepts as types (instance of E55) linked to SKOS-based thesauri concepts [9]. Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, or any other type of structured controlled vocabulary. It builds upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked. SKOS structures can be linked to CIDOC-CRM instances to provide a specialised vocabulary.
To address the semantic requirements of the project in terms of facilitating interconnections among digital resources and to relate such resources to Reflective Topics, we adopted two distinct but also complementary Knowledge Organization Systems (KOS): A domain ontology (CrossCult Upper-level Ontology); and a domain vocabulary (CrossCult Classification Scheme) in the form of a faceted classification of terms mapped to a set of specialised thesauri, which provide additional categorisations and groupings in the form of semantic short cuts. This combination accommodates a common layer of conceptual arrangements which we define as the CrossCult Knowledge Base, enabling semantic-based reasoning and retrieval across disparate data through an ontological structure and data enrichment and augmentation through a formally expressed classification of domain concepts. In this sense, the project focuses on the construction of an environment hosting and enhancing semantic representations emerging from cultural artefacts, monuments and places based on methods of data modelling and mapping with respect to well-defined interoperable semantics.

CrossCult Upper-Level Ontology Rationale
Based on the merits of comprehensiveness, specialisation and extensibility of CIDOC-CRM, the project adopts the W3C Web Ontology Language (OWL) version of CIDOC-CRM as defined by the Erlangen implementation (ECRM160714) [10]. The model guarantees the use of well-defined and interoperable semantics, which facilitate the definition of an ontology aimed at capturing formalisms that describe the "world" of CrossCult in terms of common conceptual arrangements and relationships between people, places, things, events and periods across a diverse range of cultural heritage resources. Figure 1 presents the abstract model of the CrossCult Upper-Level ontology which describes the actual semantics of the top layer of the ontology, including the relationships among CIDOC-CRM and project-specific entities. The project-specific class Reflective Topic incorporates the semantics of reflection, enabling interconnection between physical or conceptual things of manmade or natural origin. It can be understood as an extension of the CRM E89 Propositional Object entity extended by the project-specific reflects property, which is a reversed and extended definition of the CRM property P129 is about. The property in its original form describes the primary subject of a propositional object. The reflects property definition sets an instance of a Reflective Topic as the primary subject of reflection of a physical or conceptual thing.

CrossCult Classification Scheme Rationale
CIDOC-CRM as a formal and generic structure of concepts and relationships is not tied to any particular vocabulary of types, terms and individuals. This level of abstraction, albeit useful for the semantics of the broader cultural heritage domain, does not cover the need for a finer definition of types, terms and appellations. The need for an additional level of vocabulary semantics is addressed by the CrossCult Classification Scheme. The CrossCult Classification Scheme, incorporated into the upper level ontology, aims at linking semantic concepts with: (a) subjects describing the art object referring to their meaning, depiction and/or symbolism (e.g. middle class -social class) as well as names as subjects (e.g. Apollo), and forms as subjects (e.g. basilicae, aqueducts); and (b) a broader and complex set of concepts that gear the visitor to stop and reflect during or after his/her visit to the venues. These concepts, broad in their scope, consist of abstract notions that are formed by several terms in order to produce a new notion of and stimulate "reflection in history" and social values, e.g. migration through history, healing power of water, religion and social development, etc.
The CrossCult Classification Scheme is a project specific tool that aims to organise in a faceted taxonomy terms connected by their use and function in the project: these may represent concepts, people as subjects and types or forms of works. The Scheme draws from the basic principles of organising knowledge in broad semantic areas and building within them narrower concepts. The idea of creating a simple scheme that will not have the full requirements of a complicated subject organisation tool such as a thesaurus, nor the simpler form of a "subject heading tool", has actually led us to base our concept on the principles of the basic SKOS concept elements.
The following steps were taken in constructing the CrossCult Classification Scheme: (a) The contributing terms were derived from the specifications of the four pilots and the descriptions of the relevant cultural heritage objects. (b) The terms were verified against three standard vocabularies, the Arts and Architecture Thesaurus of Getty (AAT) 11 , EUROVOC 12 and the Library of Congress (LC) Subject Authorities 13 , and mapped to the authority vocabulary resources, using the skos:closeMatch property. This secured compatibility and direct linking to the authority of the controlled vocabulary. Project specific terms that could not be verified in external sources were also included, documented (mimicking the process used within the existing classifications schemes) and incorporated within the classification scheme structure. (c) Terms were organised in a faceted taxonomy, allowing the assignment of multiple classifications to each term. Terms of similar specificity were placed at the same hierarchical level. Broader and narrower term relationships were established based on the guidelines of AAT, EUROVOC and LC, whilst special effort was made to create sound hierarchical relationships of project specific terms. Apart from the standard facets that one can find in most common vocabularies (e.g. Activities, Culture, People, etc.), the CrossCult Classification Scheme contains two additional facets, which were created for the specific needs of the project: a facet for Types, accommodating the terms that are used to describe types of the entities defined in the ontology; and a facet for Reflective Topics, accommodating the terms that are used to describe topics of historical reflection. The ways that the vocabulary terms are linked to the elements of the ontology are described in Sect. 4.4.

Implementation
Four flagship pilot cases from eight venues across Europe participate in the project, comprising (as these are described in the project website): a large multi-thematic venue, many small-venues, a single venue (non-typical transversal connections), and Multiple cities (Past-Present interplay). Such a pluralistic environment of cultural heritage resources represents a considerable variety of cataloguing approaches and data structures. In order to connect these resources they were all mapped to the CrossCult Upper-level Ontology and the CrossCult Classification Scheme. This semantic alignment and mapping of the contributing resources to a common reference layer of was necessary for obtaining the benefits of interconnection, cross-searching, relational search, and context awareness of digital resources.
The available resources were modelled as ontology individuals (ontology population), the individuals were enriched with semantic definitions and linked to Semantic Web resources from DBpedia, Wikidata and elsewhere, along with the CrossCult Classification Scheme, to provide additional subject-based definitions to the ingested data.

CrossCult Pilots
A coordinated effort between historians, information scientists, and pilot representatives examined the objectives of reflection uses cases of the pilots and overviewed the contributing cultural heritage resources, in order to define the reflective topics for each pilot. In detail, the four pilots encompass the following combinations of data resources and reflection objectives.
Pilot 1, Large multi-thematic venue (National Gallery, UK). The collection contains information about paintings such as medium and support, dimensions, date of production, location in the gallery, information about the related artists, and other data explicitly related to each painting. There is also an extensive use of various types that describe paintings in terms of design techniques, styles and materials while a set of subject keywords is also available to refine the descriptions of the paintings and their relations to different concepts and themes. Reflection is encouraged by tailored recommendations that support engagement with the content based on user preference and knowledge. The user experience is advanced beyond a single choreographed route, allowing users to create their own virtual groupings and presentations, and compare their experiences with other users and the current presentation of a collection. Pilot 2, Many small venues (Roman healing spa of Lugo, Spain and Chaves, Portugal, archaeological site of Aquae Tauri, Italy and the ancient theatre of Epidaurus, Greece). The pilot contributes data resources from four separate archaeological sites and as a result, data coverage ranges from extended descriptions of objects from the archaeological sites to simple, almost telegraphic entries of objects and their associated subject keywords. In addition, data contains references to entities other than physical objects, including monuments, physical features, activities, historical locations, and people. Reflection happens by exploring connections among items aided by experts' input to enable interpretative thinking, comparison and knowledge discovery. Pilot 3, One venue, non-typical transversal connections (Archaeological museum of Tripolis, Greece). The pilot contributes data from a small Greek archaeological museum, containing descriptions about the temporal, geometrical, spatial and contextual characteristics of the exhibits. The descriptions do not vary significantly in terms of size and level of detail, albeit some descriptions are a little longer than others. The modelling requirements of this pilot draw some parallels with Pilot 1 in terms of semantically describing temporal, spatial and contextual information. Reflection is promoted by tailoring the narratives in a way that raises empathy among the participants, enabling prospective interpretation and unexpected learning, which may happen by relating elements from the narratives to aspects of the participant's life, as well as through meaningful comparisons between the past and his/her present.

Pilot 4, Multiple cities, "Past & Present" interplay (City of Luxembourg, Luxembourg
and Valletta, Malta). The pilot contributes data from a sample of several Points of Interest (POI) located in contemporary urban spaces. The data focuses on the relationship of POIs with specific reflective narratives and multimedia that drive the narratives and navigate the users of a mobile app towards the location of POIs. The data describes attributes of the POIs, including spatial, geometric and temporal information as well as reflective narratives and relevant multimedia. The pilot aims at a collaborative reflection over key topics of population movement and immigration in order to provoke comparisons on the topic of immigrant integration in the present and the past and enable users to reflect over and reinterpret migration-related events under different situations than those of the original event.

Semantic Alignment and Mapping
The mapping process addressed common modelling requirements across the four pilots with regards to spatial, temporal, geometrical, and other associative interpretations of data. Attention was paid on the extensibility qualities of the proposed model for accommodating future potential uses, whilst catering for any particular specialisation requirements hinted by the pilots. In this respect, the CIDOC-CRM proved an invaluable instrument for capturing the common semantic definitions of the participating cultural heritage resources whilst providing a clear documented process for additional project specific extensions. Figure 2 presents the modelling arrangements of the common semantics across the four pilots of the CrossCult project. At the core of the model resides the CIDOC-CRM entity E18 Physical Item, which comprises all persistent physical items with a relatively stable form, man-made or natural. The entity enables the representation of a vast range of items of interest, such as museum exhibits, gallery paintings, artefacts, monuments and points of interest, whilst providing extensions to specialised entity definitions of targeted semantics for man-made objects, physical objects and physical features. The arrangement benefits from a range of relationships between E18 Physical Item and a set of entities that describe the static parameters of an item, such as dimension, unique identifier, title, and type. The model also allows the description of more complex objects through a composition of individual items (i.e. P46 is composed of). Moreover, the well-defined semantics enable rendering of rich relationships between the physical item and entities describing the item in terms of ownership, production, location, and other conceptual associations. The project specific property reflects has been added to enable specific, direct connections between existing concepts and the CrossCult class Reflective Topic.

Population and Enrichment of the CrossCult Knowledge Base
The Population and Enrichment phase applied the conceptual arrangements and definitions of the CrossCult ontology to a range of disparate data resources originating from the four pilots of the project whilst linking a selected set of ontology individuals to Semantic Web resources and definitions. During ontology population, the tasks of data decoupling, cleansing and semantic enrichment were performed and a diverse range of cultural heritage data was mapped to a common layer of semantics complying to the CrossCult Ontology [15].
Three separate stages addressed issues affected by the heterogeneity of the available data. The Manual Data Extraction stage imposed a unified data structure across a range of unstructured sample data available in text format. The task identified textual instances of relevant types (i.e. type of exhibit and related material), temporal and spatial information, dimensions, and other features of interest such as inscriptions or visual representations. The Semi-Automatic Database Construction stage populated a set of relational database tables with structured data, from spreadsheets originating directly from the pilots. The Automatic OWL Generation stage, ingested the structured data of the relational database into the ontology. The process employed a series of PHP routines driven by SQL queries for retrieving selected database records and declaring them as ontology individuals using OWL class and property assertions. The routines cater for the automatic generation of statements with respect to individual(s) declaration, class assertion, object property assertion, and data property assertion.
The semantic enrichment phase enriched a selected set of ontology individuals with links to standard and well-known Semantic Web resources, such as DBpedia and the Getty Art & Architecture Thesaurus. The symmetric property owl:sameAs is employed for enabling linking of individuals to DBpedia resources. The process provided additional definition, consistent standardised descriptions, and enhanced connections improving utility and interoperability of content, as demonstrated in Fig. 3. The figure presents the classification and relationships of ontology individuals describing the National Gallery painting of Eustache Le Sueur, Alexander and his doctor, about 1648-9 (NG6576). The painting is modelled as an instance of E22.Man-Made Object uniquely identified by a National Gallery (UK) reference and associated with a conceptual type (Canvas painting). The information related to the production of the painting, such as date of production, artist and technique, is handled by the semantics of a production event.

Vocabulary Integration and Association
The CrossCult Classification Scheme was integrated into the Upper-Level ontology delivering a unified Knowledge Base resource, as depicted in Fig. 4). Vocabulary terms were all defined as instances of the skos:Concept class, and connected to external vocabulary resources using appropriate properties, as described in Sect. 3.2. Terms referring to types were classified under E55 Type and were associated to the individuals they describe via the P2 has type property. Terms that represent subjects used to enrich the semantic description of cultural heritage objects or places, were classified under E89 Propositional Object. They were then associated to the objects/places/reflective topics they refer to via the P67 refers to property. Finally, vocabulary terms referring to Reflective Topics were classified under the project-specific ontology class Reflective Topic, and were associated to the entities that drive the corresponding reflection via the reflects property.

Discussion
CrossCult is now in its second year, during which the remaining parts of the CrossCult Knowledge Base are being finalised based on the same data modelling methodology and principles used for the Upper-level ontology. Specifically, our ongoing work includes: adding ontological definitions for other project-related concepts, such as the pilots' venues and the users of the pilot apps; further refining the scope and structure of Reflective Topics and their relation to keywords, narratives and other reflection proposals; augmenting the data with media content and narratives that enhance their reflection and re-reinterpretation qualities; further semantically enriching the resource descriptions with more links to external standardised Semantic Web resources. At the same time, the CrossCult platform and mobile apps for the four project pilots are being developed. The CrossCult platform consists of: front-end tools, which can be used by experience designers, museum experts/curators and external stakeholders, to develop market-ready applications; a back-end, which integrates technologies for the storage and management of the available information and digital resources as well as supporting any other necessary functionalities needed by the front-end and the mobile apps (e.g. route/path recommendation [11], personalisation [12], micro-augmentations [13], games [14], etc.).
The semantic-based design of the CrossCult Knowledge Base, presented in this paper, enhances the capabilities of the CrossCult platform and the mobile apps in many different ways. It enables the development of services, e.g. for search, navigation, route finding, etc., that (i) can be tailored to the needs and preferences of each user; (ii) can highlight associations between different cultural heritage resources or venues and form groupings of items from the venues' collections under certain historical topics, serving the history reflection and re-interpretation aims of the project; (iii) augment the user experience by linking the cultural heritage resources of a venue with external historical, geographical or other types of information or digital resources; (iv) support different kinds of visualisation of a venue's collection based on the temporal, spatial or any other kinds of contextual relationships; (v) are extendible to more types of information and, therefore, new functionalities. Moreover, given the growing popularity of CIDOC-CRM, and Semantic Web technologies, among cultural heritage institutions, the Cross-Cult Knowledge Base will contribute to the development of a broader knowledge-based network of museums, galleries and other cultural heritage venues, which in the future could enable the development of unified services for the visitors of their physical or virtual collections, built on top of a global knowledge base for cultural heritage.