Edinburgh Research Explorer Establishing Core Concepts for Information-Powered Collaborations

Science beneﬁts tremendously from mutual exchanges of information and pooling of effort and resources. The combination of different skills and diverse knowledge is a powerful capacity, source of new intuitions and creative insights. Therefore multidisciplinary approaches can be a great opportunity to explore novel scientiﬁc horizons. Collaboration is not only an opportunity, it is essential when tackling today’s global challenges by exploiting our fast growing wealth of data. In this paper we introduce the concept of Information-Powered Collaborations (IPC) – an abstraction that captures those requirements and opportunities. We propose a conceptual framework that partitions the inherent complexity of such dynamic environments and offers concrete tools and methods to thrive in the data revolution era. Such a framework promotes and enables information sharing from multiple heterogeneous sources that are independently managed. We present the results of assessing our approach as an IPC for solid-Earth sciences: the European Plate Observing System (EPOS).


Introduction
Cooperation and collaboration have characterised the organisation of work in various contexts throughout history. Consequently, the support for collaborative work has been investigated for a long time by scientific disciplines such as the Computer Supported Cooperative Work (CSCW). Since the mid 80s a rich CSCW literature produced several theories and approaches proposed to model and improve collaborative work sustaining sharing of knowledge and expertise [1,2,3]. The importance of scientific collaborations is not only well-recognised but it is encouraged and fostered, e.g. by policy makers and funding bodies, as a way to improve impact, to achieve cost-efficiency and to tackle the pressing data challenges faced by nearly all scientific disciplines. Collaborations based on information sharing, contribute different viewpoints and combine skills and intellectual efforts to tackle the increasing complexity of contemporary scientific challenges. We propose a conceptual framework that combines two ingredients -collaborations and data -to help establish Core Concepts (CC) underpinning Information-Powered Collaborations (IPC).
Cooperation among diverse actors carries inherent socio-technical issues and requires us to maintain 'a common terminology and shared knowledge base' that enable communication and understanding [4]. The framework proposed in this paper provides a set of tools to build and maintain a common vocabulary and a shared information space.
Data and research collaborations are strongly interrelated, and collaboration starts with sharing -data sharing has received considerable attention in the last decade being widely recognised as an accelerating factor for the scientific progress [5]. Nevertheless, data sharing is just one aspect underpinning research collaborations. Equally important are: sharing of methods, context and best practices; understanding of implicit communication rules, norms and prior knowledge that form the culture of the involved scientific communities (Designated Communities) [6]. Many aspects of the culture, such as formalised methods and data-access rules may be represented as shareable data so that extensive distributed collaboration can be better supported.
Building research collaborations is a major endeavour that requires time and investments that increase rapidly with the diversity and the number of involved parties. Retaining the value of those investments, sustaining and maintaining efforts over time are necessary strategic choices. The management of research collaborations ought to interface and account for the organisational structures present in each community. Different strategies may be needed to address these issues.
Our framework builds on autonomous sources of information and a set of formal agreements to support IPC, combining intellectual effort and pooling resources and expertise from multiple independent organisations. The framework is based on: a) a Canonical Core that holds b) Core Concepts and connects to c) a set of dynamic Boundary Regions needed to sustain a community's agility and innovative drive.
Such a framework enables holistic views of the autonomous sources whilst preserving specialised domain specific views. In this paper we present such a framework and describe its application in the context of solid-Earth sciences. The remainder of the paper is organised as follows: in Section 2 we present the rationale that motivated this effort; Section 3 contains related work; in Section 4 we describe our conceptual framework; in Section 5 we introduce a research infrastructure for solid-Earth sciences, the European Plate Observing System (EPOS), and illustrate an application of our methodology in that context; Section 6 provides a preliminary evaluation of the presented approach; finally in Section 7 we draw conclusions and outline future work.

Information-Powered Collaborations
We define the concept of Information-Powered Collaborations (IPC) below. IPC is an abstraction that represents a typical modern research context characterised by rich interactions, exchanges and complex dynamics. Traditionally the research scene was dominated by research groups in controlled environments, with limited interactions with their peers [7]. The data revolution has deeply impacted every domain demanding a paradigm shift where collaboration is essential to manage the amount of data and to interpret the derived information. The IPC can offer a means to address and tackle today's challenges stimulating and facilitating pooling of knowledge (as well as data and information). To achieve this, the IPC must fulfil a number of requirements, as we illustrate below.

Use cases
In this section we present a selection of use cases and the corresponding requirements that an IPC should fulfil.

Resource discovery
Resource discovery -implies the search of high level descriptions (metadata) carrying information for instance, about type, name and origin of a resource. It entails operations such as selection and filtering matching specified criteria. Examples of a multi-faceted search crossing domains: FIND all time series catalogued since date, time giving geochemical emission, seismic activity and surface movement for Etna; or FIND the seismic events in 2017 in Southern Europe together with geology, Global Navigation Satellite System (GNSS) velocity and satellite data correlated with those events.

Resource evaluation
Resource evaluation -requires deeper descriptions of resources (e.g. domain-specific and contextual metadata) [8,9]. It exploits additional metadata fields beyond the classification of a resource in order to query, select, filter actual instances of resources according to desired characteristics. Example: FIND all the seismic events with magnitude M > 5, that occurred in a time-window (Tw), in a specific region (Re) AND the related primary data (seismic waveforms) with fewer gaps than 5% in Tw AND the GPS displacement maps associated with (Tw, Re).

Scientific methods
Scientific methods support -helps collaborating teams of experts create and refine methods that draw on the diverse resources and data collections. It promotes the formalisation and automation of these methods, typically as scientific workflows [10], while supporting critical procedures to deliver good quality evidence contributing to the shared knowledge.
Example: develop methods and models to reveal the impact on seismic hazard from mineral extraction methods. The authoring system consults the metadata catalogue to help the method developer make choices, detect defects and plan enactment. The enactment system consults the metadata catalogue to verify compliance with policies, to plan the optimal deployment and annotate provenance records. The provenance system links with the catalogue, mainly via identifiers, to support diagnostics, validation, reproducibility and evidence qualification.

Supporting shared agreements
Metadata catalogues play a central role in our framework -the currently agreed set of instances of Core Concepts is represented by these catalogues. Standard vocabularies provide a vital element of the Core Concepts. It is widely recognised that they help fulfilling requirements 2.2.1 and 2.2.2. In particular, vocabulary profiles enable validation via lists of allowed values, cardinality of elements and specifying detailed application contexts.
However, in order to achieve semantic interoperability and enactment of workflows 2.2.3 exploiting cross-domain resources, the mode of employment of such vocabularies must also be specified. This requires the definition of agreements about the interpretations and meanings of the values associated with vocabulary terms, and the formalisation of such agreements in the shared vocabulary. The latter can be achieved by introducing formal restrictions and constraints expressed for instance in OWL 6 , SHACL 7 or SHEX 8 .
Achieving shared agreements on the interpretation of vocabulary terms in multidisciplinary environments is not a trivial task. Even a common concept such as time can carry diverse semantics depending on the temporal reference or the calendar used in specific context. For instance, in archeology or geology time is often expressed counting years backwards from a reference date. In a lunisolar calendar (e.g. Chinese Calendar) time is expressed according to astronomical phenomena. Those reasons inspired domain-specific formalisations, e.g. for geological timescales [11], and extensions in conventional representations such as OWL-Time 9 to include non-Gregorian calendars [12]. Figure 1 provides an example of the diversity in time scales which are present in solid-Earth sciences. Each of those might be associated with different reference systems and therefore a different semantics of time. The deployed instruments may resolve time with sub-microsecond resolution to triangulate signal sources. A conceptual framework from geological, through historical to observational time needs clarity about the transitions and correspondences.
In order to support multi-disciplinary, multi-organisational and multi-national collaboration the underlying concepts must be recognised and agreed. These are often formalised as ontologies [14]. Collaborative development of such ontologies often reveals variations and encourages refinement of such concepts, illustrating the kind and scale of investment needed to build, agree and adopt the Core Concepts.

Related work
We observe initiatives that aim at supporting multidisciplinary research collaborations and investigate relevant enabling technologies and standards. We review those initiatives and technologies in terms of their contribution to the three aspects of support: 1. agreeing a common set of concepts; 2. representing that agreement in human readable and machine actionable forms; and 3. provisioning of tools and platforms for collaboration. In most of 6 www.w3.org/owl/ 7 www.w3.org/TR/shacl 8 https://shexspec.github.io/spec 9 www.w3.org/TR/owl-time/ Source: [13]  search that engages with the distant past as well as the present, such as the solid-Earth sciences. This is just part of the range encountered by sciences that observe to sub-microsecond resolution for today's observations to resolve hypothesised models spanning billions of years.
the cases these are intricately bound together but we show the value of considering them separately.

Agreeing common concepts
This predominantly manifests as defining agreed vocabularies and importing, extending or merging existing ones.

Schema.org
Schema.org is a vocabulary created in 2011 by Google, Microsoft and Yahoo to describe Web resources and improve the search of content on the Web, thus assisting search engines as they interpret pages in different contexts. Since its conception Schema.org has grown into a popular mechanism to represent structured data on the Web; it is supported by many tools and includes a variety of domains [15]. Schema.org is constituted by a hierarchy of classes and relationships -it is compliant with RDF and reuses existing standard vocabularies such as Dublin Core. Typically it is embedded in HTML pages using Microdata 10 , JSON-LD 11 and RDFa 12 .

W3C -DCAT and DCAT profiles
W3C has invested significant effort steering the development of a vocabulary to facilitate the interoperability of catalogues published on the Web, namely the Data Catalog Vocabulary (DCAT) 13 . At present DCAT is a W3C Recommendation that has been endorsed by many players including scientific communities, policy makers and other stakeholders [16,17]. "By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs.
It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation". Several profiles of DCAT have been produced to address different requirements and there is an active community supporting the uptake of their data model. Furthermore, DCAT is natively supported by catalogue platforms such as CKAN [18]. Examples of such profiles include: DCAT-AP [19] used to describe public sector datasets in Europe; GeoDCAT-AP [20] -a DCAT-AP profile describing geospatial datasets, dataset series, and services; and StatDCAT-AP -a DCAT-AP profile for statistical datasets [21].
One of the key features of DCAT is that it incorporates terms from existing and widely used vocabularies such as Dublin Core, SKOS and FOAF. This aspect increases its dissemination and facilitates adoption and uptake into existing systems.
Application profiles try to fill the gaps in the base DCAT standard. Some gaps have been identified and discussed at the "Smart Descriptions & Smarter Vocabularies (SDSVoc)" 14 workshop organised by W3C and the VRE4EIC project 15 . Although the current DCAT recommendation is recognised as a powerful tool to improve interoperability of datasets, further work and guidance are needed to extend its adoption and to tailor it to meet community requirements for particular IPC. The W3C Data Exchange Working Group (DXWG) 16 has been recently set up to collect and address requirements from the communities and help improve the DCAT data model. 10

Representing conceptual agreements
This depends on underpinning metadata and ontologies organised, combined, referenced and analysed using formalised models.

Metadata and interoperability
Metadata approaches have been widely discussed as methods to enable interoperability [22,23,24,25]. In the context of digital libraries the metadata interoperability issue has been recognised for a long time. As the mission of digital libraries is to acquire, preserve and provide access to a variety of heterogeneous digital objects, librarians quickly encountered issues related to the appropriate description of digital objects and developed standards and methods for their categorisation. For instance, standards-based metadata, metadata cross-walks or mappings, application profiles and metadata registries have been demonstrated to be valuable methods to enable schema-level metadata interoperability [26]. Those methods build on a classical interpretation of information organisation systems, mainly hierarchical and authoritative, thus reflecting an objectivist philosophical perspective [25].
However, that perspective has been considered inadequate to organise complex information [27]. The advent of social media stimulated collaborative approaches to metadata which exploit social tagging and yield folksonomies [28]. Such approaches reflect a social constructivist perspective of the world, they take into account heterogeneous viewpoints, fluidity of interpretation and knowledge sharing [25]. Although authoritative and collaborative, or in other words top-down and bottom-up, approaches might seem antithetic, they can coexist providing complementary perspectives and, as advocated by Gruber, lead to ontologies for folksonomy [29]. Whilst top-down approaches contribute a "'simplified" canonical view according to paradigms of classifications that have been known to humans for a long time, folksonomies recognise the existence of different possible interpretations and account for specialisations and extensions known and understood by subgroups and individuals. The Web Annotation Vocabulary 17 is an example of an ontology supporting such a collaborative approach.
Semantic interoperability entails information sharing and exchange based on negotiated meanings and expressions [22], it goes beyond the schema-level specifying how metadata records or content values are exchanged and used. Therefore, semantic interoperability deals with structure and includes interpretation leading to mutual understanding of concepts, relationships and their values. Alemu et al. argue that in order to achieve semantic interoperability metadata objects ought to be enriched with knowledge coming from collab- 17 www.w3.org/TR/annotation-vocab orative and user-driven approaches [25]. Semantic web technologies can provide the appropriate support to achieve semantic interoperability and harmonisation [24]. This depends on leveraging declared vocabularies and mechanisms to extend them; unique identifiers that help avoiding naming conflicts and duplications and the ability to express relationships among resources and elements.

Shapes Constraint Language (SHACL)
The Shapes Constraint Language is a recent W3C Recommendation that is rapidly gaining interest in the semantic community. Semantic languages, such as OWL [30], offer a powerful means to describe terms and how they can be used but they lack a mechanism to record the applications of such terms. The latter is particularly useful to share and reuse knowledge among communities. Data shapes expressed in SHACL fulfil this requirement providing an effective and flexible tool for data integration. Shapes are RDF expressions that explain how data is organised. Those expressions include allowed rules, values, patterns and offer a powerful mechanism to formalise constraints and validate data structures.
They can be used as templates to model and query data structures. A number of use cases 18 for the application of SHACL are currently under discussion. The "Open Content Model" 19 (OCM) is an application context of particular interest for us. For instance, according to the OCM multiple independent applications might agree to share the same representation for common data items and allow the presence of undefined data items to account for specialisations in the diverse applications.

Platforms for collaboration
We include organisational and technical approaches to populate the conceptual space with concrete instances and examples of multidisciplinary infrastructures.

Computer Supported Cooperative Work
CSCW investigated the social aspects of knowledge sharing and the systems to support it. Such investigations yielded approaches to define and maintain 'common information spaces', to represent knowledge for instance by adopting a 'repository model' and/or exchange it via knowledge artifacts and 'boundary objects' [1,31,2,3]. An important branch of CSCW research focused on providing access to and exchanging expertise, recognising the importance of communication and helping establish communications among 'knowledgeable actors'. For these reasons CSCW research provided a fertile ground for a number 18 www.w3.org/TR/shacl-ucr/ 19 https://www.w3.org/2014/data-shapes/wiki/Open_Content_Model_Example of technical solutions currently adopted in knowledge management and collaborative systems.

Research Data Alliance
The Research Data Alliance 20 is an international, multidisciplinary, community-driven organisation that is very active in the area of data sharing and exchange, data interoperability and data-driven innovation. Recommendations, infrastructure design, policies and various initiatives are emerging to lower the barriers to data sharing and accelerate innovation. Some of these initiatives have recently been endorsed by the European Commission 21 who recognises their importance for referencing in public procurement, in particular:

Virtual Research Environments (VREs) and related frameworks
Virtual Research Environments are well-known, powerful frameworks that enable collaborative science. VREs provide scientists and practitioners of communities of practice [33] with tools and working environments (or laboratories), usually accessible via the Web, that encompass data, services and computing enabled features such as processing, visualisation, communication, data access and workspaces. Such environments can be deployed in different contexts thereby serving the needs of a variety of communities, however they usually target single disciplines or closely related topics. Recent developments demonstrated the feasibility of aggregating cross-cutting resources to offer VREs as a Service in order to maximise the adoption and productivity in multidisciplinary contexts [34]. Similarly, Virtual Laboratories(VLs), Science Gateways (SGs), Virtual Organisations (VOs) and Digital Libraries (DLs) provide the necessary tools and interoperability to enable interactions and foster seamless access, usage and sharing of resources across diverse stakeholders [35,36]. There is a substantial interest in the scientific community in VREs (VLs, SGs, VOs and DLs) that yields a flourishing scientific literature and many initiatives and research 20 www.rd-alliance.org 21 http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX: projects. However, as shown in a recent discussion at the RDA VRE-IG 23 , the terminology and the definitions, although often overlapping, are still disputed and often subject to different interpretations. In our analysis, whilst acknowledging the diverse flavours, we use those terms interchangeably. Such systems deal with the human-computer interactions and socio-organisational issues as well as authorisation and resource management. In this paper, we assume such a context and focus on supporting the process needed to build an underpinning alignment of concepts and information.

Virtual Observatories (VOs)
The concept of Virtual Observatory was first introduced by the astronomers as a means to discover, access and process data seamlessly [37,38]. The goal was to provide an abstraction layer on top of astronomical data provided by independent organisations following the analogy of the World Wide Web. The astronomy community produced a predominant example of successful, long-term collaboration led by the International Virtual Observatory Alliance (IVOA). IVOA discusses and promotes standards for interoperability, protocols for data access and exchange. Since its establishment in 2002 it has supported the astronomy community to establish innovative technical solutions at global scale, disseminate results and promote effective collaborative working practices [39].
Other examples of VO are: the CLARIN Virtual Language Observatory 24 targeting language resources [40]; and the Web Observatory (WO) -a large system that enables multidisciplinary Web Science. WO focuses on data about the Web to study and understand the evolution of the Web anticipating future trends and developments [41,42]

The Global Earth Observation System of Systems (GEOSS)
GEOSS is a global initiative coordinated by the Group on Earth Observation (GEO) to build a large-scale network of content providers into a single overarching system. It embraces the most important existing infrastructures for Earth Observation at a global scale. GEOSS adopts the well-known System of Systems (SoS) approach where many autonomous, independent systems are coherently networked and co-operate to achieve common goals [45]. The GEOSS Common Infrastructure (GCI) is the e-infrastructure that underpins GEOSS and leverages the distributed independent resources, harmonising data and models, providing access to resources, applications and products. The GCI exploits a brokering approach to provide users with transparent access to the distributed resources [46]. The concept of SoS captures the common issue of integrating many independent, autonomous systems in order to achieve a global common goal. GEOSS aims to provide decision support tools and what-if type of analysis, with information and knowledge delivery as a goal. Santoro et al. [47] introduce the Model Web framework that captures business processes as workflows. To address the Science-to-IT barrier issue they leverage models, workflows, vocabularies and knowledge bases. Their focus is primarily on how to combine and use those resources, whereas our focus is on how to support their construction and harmonisation leveraging Core Concepts for collaborations.

Summary of related work
We presented three aspects to support collaborations and achieve interoperability. In

Approach and methodology
From our assessment it appears clear that the construction of the conceptual framework that enables effective collaboration has to be led by humans. Scientific communities, users and stakeholders of an IPC assume a central role in guiding the construction and maintenance processes. Those shaping the IPC develop and maintain its conceptual core by assessing which concepts can be consistently used and interpreted across the consortium.
They often proceed by importing large established vocabularies with their corresponding definitions and relationships. They need to manage the relationships between such conceptual bundles eventually extending or pruning them in order to meet the requirements of their IPC. They must recognise where creative diversity exists and leave opportunity for agile innovation in these conceptual spaces.
Our approach combines top-down and bottom-up strategies to formulate the agreed core set of shared concepts and achieve semantic interoperability in IPC. We propose that this progresses by building a Canonical Core (CC) that includes sufficient Core Concepts that are agreed and adopted to enable the principal interdisciplinary collaborations to proceed.
The extensions needed beyond this CC to support innovation, experiment and local specialisations are supported by dependable relationships with the CC. Approaches based on reference ontologies have been profitably applied in more controlled contexts e.g. in the industry [48,49,50]. We build on those results to devise a solution for the challenging IPC context.
The whole process exploits co-design bringing together data and metadata-modelling experts with domain scientists. Similarly to data models and their representations, the rules of engagement or "contracts" to participate in the IPC are critical. Such rules are discussed and defined with the designated communities and leverage existing community standards and practices.   The conceptual definition (1) constitutes an unbounded conceptual space independent from the other dimensions. In this paper we provide an approach to manage the complexity of that space, and apply such an approach to the concrete context of EPOS. We also propose a representation (2) fitting the designed space and meeting the requirements of the identified designated communities. Finally, we validate the chosen representation populated with a selection of real instances (3).

Principles underlying the conceptual definition of the Canonical Core
The conceptual definition of the CC needs to address three aims italicised below. The following principles shape the concepts, relationships and structure of the CC.
1. Achieve sufficient coverage of the behaviours required across the designated communities that the CC supports their interactions with the shared information and with each other, thereby facilitating collaboration leading to adoption and reuse.
2. Establish agreed interpretations of the Core Concepts that are adopted by the designated communities -when such agreements cannot be reached allocate the concepts to an extension for the relevant subcommunity coupled to the core via identified conceptual hooks -thereby achieving harmonisation without inhibiting innovation.
3. Validate the CC against a broad and representative set of use cases, thereby ensuring priority collaborative behaviours are enabled and achieving trustworthiness and completeness.
The volume and complexity is controlled by limiting the core to accepted and agreed material. Contenders for inclusion develop in the dynamically connected boundary regions.
The set of use cases is extended to fulfill all critical requirements and to ensure that the CC covers the essentials.
According to the principle (1), rather than building from scratch we select and import existing conceptual bundles, information spaces, boundary objects and knowledge artifacts [1,31,52,53] into the CC. This adoption of existing bundles has two motivations: a) to retain intellectual effort -as bundles are often the result of long and costly negotiation (implicit and explicit); and b) to facilitate understanding and automated interaction -as communities and their automated methods will recognise familiar patterns and artifacts.
Nonetheless, the CC cannot be just the union of pre-existing bundles -harmonisation (2) plays an essential role. Without harmonisation the CC would be a collection of information silos that preserve domain specific structures together with their boundaries. This would result in a data warehouse that collects data unchanged, thus failing our principal goal that is to facilitate boundary crossing by providing holistic semantic integration.
We harness real use cases (3) to tease out and clarify the objectives and aims of the designated communities whose work and communication will be mediated via the CC when they adventure across previous boundaries. To turn an unbounded conceptual space into a manageable space we follow communities' priorities. As use cases evolve and change the associated dependencies and boundaries follow accordingly, thereby identifying required extensions and modifications to the core. Hence, the CC has a clear requirement for flexibility and support for evolution. These guiding principles shape the construction and evolution of the Core Concepts.

Principles underlying the representation of the Canonical Core
Representation entails metadata, it reflects aspects of the real world for intended purposes and viewpoints [54,55].  25 and extended by Nilsson et al. [57,24]. These deliver the following: 1. Extensibility, ability to create and add new structures to a metadata standard for "application-specific or community-specific needs".
2. Modularity, "ability to combine metadata fragments adhering to different standards".
4. Multilingualism, "ability to express, process and display metadata in a number of linguistic and cultural circumstances".

5.
Machine-processability, "ability to automate processing of different aspects of the metadata specifications".
These principles fit the characteristics of an IPC as they assume and acknowledge the coexistence of multiple standards and different specifications. Also, they enable the collaborative approach, for instance members of the designated communities can annotate existing content creating new relationships (1)  8. Effectiveness representing the required concepts for the selected application scenario.
For instance, verbosity might be more effective in machine-to-machine exchanges whereas terseness might help human reading and understanding (e.g. Turtle/RDF 26 , 9. Performance of the encoding/decoding processes, required to marshall and unmarshall the content of the core. This is an important non-functional engineering aspect that influences the overall behaviour of the system and in particular of the population described in the next section.

Principles underlining the population of the Canonical Core
The population describes the distribution in time of the entities (instances of concepts and instances of relationships between them) in the CC. Population is a dynamic process that is guided by the principles listed below.
1. The strategy adopted to populate the CC is influenced by several factors e.g. 3. Quality control is fundamental to manage the population of the core. Quality indicators must be used to assess new entities and providers of entities as well as to modify the population, for instance by removing entities that do not conform to defined quality standards. Pruning, clean-up, deduplication and notification mechanisms can be implemented exploiting such quality indicators.

4.
Governance, for instance, existing community agreements associated to specific bundles might influence the population strategy and require access control mechanisms.

Considerations about the boundary regions
In the previous sections we focused our analysis on the characteristics of the CC, we briefly mentioned boundary regions (BR). The CC is an abstraction layer avoiding the complexity of the BR -the core falls under a federation-wide governance whereas BR are independently controlled. For this reason it is difficult to provide a full characterisation of BR. Therefore our focus is at the interface between the boundary regions and the core and on the "rules of engagement". Such rules can be modelled leveraging the 'boundary objects' concept introduced by Star and Griesemer [1,31].

BR expose a bounded-openness -new boundary regions can be added, removed and
at the same time each region can contribute new bundles to the core, provided they fulfil the agreements negotiated with the core.
3. Popular bundles are easily recognised, connected and imported into the core, as they typically gather consensus and form standards whereas less popular bundles constitute extensions. The value of both must be preserved and accounted for, thus the interface has to support both cases and allow differences. In 1945 Vannevar Bush describing memex, wrote "trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory" [58]. This captures very well the requirement for promoting and highlighting extensions based on diverse criteria in order to engage and attract users and avoid unproductive migrations to other systems, dispersions and so-called "skunk work", where researchers hide their activities to achieve agility and flexibility with consequent loss of evidence for reproducibility and sharing.
To address these requirements the interface between the core and the boundary regions can be modelled as an API for managing extensions. Such an API supports the following

Building the EPOS Canonical Core
In this section we describe an application of the approach introduced in section 4. We apply our methodology to establish the EPOS CC addressing its three dimensions: definition, representation and population.

European Plate Observing System (EPOS)
The European Plate Observing System (EPOS) 28    The metadata describing data and assets are hosted in the EPOS ICS Metadata Catalogue (EIMC). The EPOS CC is represented in the EIMC that underpins the organisation of integration processes and fosters interoperability between the multidisciplinary data, products, software, services and resources of the contributing research communities.

Definition of the EPOS Canonical Core
The definition of the EPOS CC is conducted by the EPOS metadata group (that includes diverse expertise) based on a set of requirements and use cases collected during the FP7 EPOS-PP (Preparatory Phase) and H2020 EPOS-IP 30  The DDSS survey is a valuable asset given the wide scope and heterogeneity of EPOS.
To agree it required a strong engagement strategy with the communities exploiting several communication channels. Starting from the DDSS a finer-grained classification has been produced with incremental refinements leading to the definition of the EPOS CC. Such refinements were influenced by geospatial standards (e.g. ISO19115) and the CERIF data model [60,61]   ping concepts whose definitions might be adopted unaltered by a different community (e.g. seismic waveform). However, specialisations, modifications and partial reuse have to be accounted for. In some cases similar concepts may have different interpretations (e.g. quality data). The CC has to accommodate diversity and support a range of required scenarios.
The concepts and entities collected in the CC support the use cases and requirements developed by those supporting the IPC and described in section 2.2. The EPOS CC definition is an ongoing process that will continue after EPOS has transitioned to its operational phase [62]. The conceptual framework established and described here will be a valuable tool to support the evolution of this core.

EPOS Canonical Core representation -EPOS-DCAT-AP
After completing the conceptual definition of the first version of the EPOS CC, the next step was to find a suitable representation that would meet the requirements of the desig-    Along with the overview of the communities' assets, information was collected about the formats, conventions, vocabularies and standards adopted by the communities to represent their resources. In particular the survey revealed that several domain-specific standards co-exists with broader standards. The adoption of standards and shared practices depends on the maturity of the communities. They can be quite heterogeneous. Table 1 [64] and addresses the following concerns: • Extending the data model with additional concepts required by the EPOS CC (e.g. Equipment, Facility, Publication, WebService and Software).
• Introducing new relationships and roles.
• Describing APIs for the programmatic access to datasets.
• Strengthening engagement with scientific communities supporting the inclusion of domain specific knowledge.
The latest version of the EPOS-DCAT-AP data model is available online 34  It is worth mentioning that the availability of tools that allow representational translation, such as X3ML by FORTH [65], might make the choice of a specific representation less sensitive. Where needed, multiple representations might coexist without affecting the conceptual definitions of the CC.

Population of the EPOS Canonical Core
Once the EPOS Core Concepts have been identified and agreed, and an appropriate representation chosen, the next step is the population of the CC with real entities from the designated communities. This requires close interaction and collaboration between domain and metadata experts. Ultimately, population needs to be a process that is automated as far as possible. But this requires preparatory work. First experts need to agree the data sources for each concept. They then need to develop import-transformations and protocols. These may stimulate changes at sources and in the CC. Once validated, the parties involved need to agree to sustain the relationships and then an automated process can be coded and run whenever necessary.
To kick off the population process dedicated meetings and workshops were organised targeting the EPOS communities. Documentation, training material, demos and webinars were delivered prior to the face-to-face events in order to inform and prepare the communities for the effort required. This needed to develop the motivation and stimulate the commitment of effort. Moreover, collaborative tools such as wiki and shared repositories 36 have been set up to collect the inputs and feedback from the communities and share documentation and results. To achieve the preliminary population of each community's bundle into the EPOS CC, the communities had to map their resources to the corresponding concepts of the EPOS CC with support from the EPOS-DCAT-AP experts. Due to the scale and complexity of this process the mapping has been carried out in stages prioritising specific entities and adopting in an initial phase a simplified XML representation. Table 2

Evaluation
Providing a complete evaluation of the impact of the presented framework in the EPOS community is unfeasible at this stage due to a number of reasons. The nature and scope of the issues addressed in this research require a longer time scale to be effectively measured.
There are individual and organisational aspects that influence adoption and uptake. Those are critical within a single organisation and become much harder in multi-organisational and multi-disciplinary contexts. We target sharing behaviours and working practices that require time to assimilate novel elements. EPOS is currently in its implementation phase [62], for a more complete assessment evaluations ought to be repeated when it is transitioning to its operational phase. These should then be repeated periodically to detect trends.
Our framework builds on similar approaches that exploit catalogues and agreed canonical forms in the seismological domain [66]. That experience provided us with useful evidence of benefits and adoption although it has been applied in a more tractable context.
In this section we report an assessment of our work by highlighting some of the challenges encountered and addressed engaging with the EPOS communities. In a recent meet-ing 40    would be useful to repeat this evaluation when more experience has been acquired and to assess the benefits delivered .
To conclude this analysis, we highlight some key outcomes: the collaborative interaction has been very successful and productive, it allowed us to collect feedback and improve many aspects in order to better support communities' requirements. It encouraged us to think about issues previously unanticipated and developed a common vocabulary and understanding about concepts. This suggests we have a foundation and modus operandi for sustainable incremental progress.

Conclusions and future work
In this paper we have introduced the concept of Information-Powered Collaborations (IPC), an abstraction the captures the complex dynamics of a modern research context that depends on multi-organisational, multi-disciplinary, multi-national collaboration with increasing complexity and scale. We proposed the formation of an explicit Canonical Core (CC) as their foundation for information sharing and a framework that partitions the complex task of agreeing and maintaining a consistent set of shared Core Concepts to sustain interdisciplinary collaboration. That set has three independent aspects: conceptual definition, representation and population. We have demonstrated how such a framework facilitates the construction and evolution of the information space underpinning an IPC by enabling successive refinements of the three aspects. For instance, communities who are mainly interested in having their entities (e.g. data, services and methods) available in the CC will focus on the population. Those developing automated methods might find the current representation is missing aspects needed and therefore require additions to the representation of the CC. Similarly, someone interested in extending high-level goals might enrich the set of Core Concepts. Thanks to our framework those issues can be addressed independently and progressively, thereby exploiting a separation of concerns. Another important advantage of our framework is that it supports innovation, experiments and heterogeneity. It enables the retention of valued working practices in the Boundary Regions until it is beneficial to transition them into the core, thereby minimising disruption, avoiding constraints and pursuing continuous incremental adoption. Furthermore, it fosters more efficient communication and progressively negotiated agreements between the stakeholders by partitioning the dialogue. As communication is particularly challenging in multi-disciplinary, multi-cultural environments the presented framework provides a significant advance that has been tested in EPOS. We will continue with this approach in EPOS. In particular we plan to: 1. maintain the current set of Core Concepts evolving the Canonical Core when required by new requirements and use cases; 2. further develop and refine the EPOS-DCAT-AP representation, by strengthening the collaboration with the W3C by working with the DXWG in order to make it available for other communities; 3. provide tools leveraging existing components to better support the designated communities in the automated population of their entities. For instance, by means of: graphical interfaces, convertors, mapping services, etc; and 4. work on the integration of annotation management tools such as EUDAT B2Note 43 to further exploit the collaborative approach.
Establishing collaborative knowledge to achieve holistic integration and semantic interoperability is an extremely complex task of wide interest that requires alignment of technical, organisational and cultural factors. In order to succeed in this endeavour implications and issues ought to be recognised and addressed effectively, stakeholders acknowledged and good behaviour properly rewarded, e.g. by promoting evidence of enhanced scientific results and increasing return on investments. Accommodating local diversity while encouraging migration towards and engagement with the core is essential for sustaining effective collaboration. Although a long way still remains along this path, we believe that the set of principles, the philosophy and the approach proposed are important initial steps.

AppendixA. EPOS-DCAT-AP and examples of its application
In this appendix we present a simplified UML class diagram of the EPOS-DCAT-AP model and examples of encodings in the RDF/Turtle notation. Figure    ... .