Semantic data modeling based on CIDOC CRM for Mesolithic footprints analysed with a multi-method approach (CAA 2026 Vienna)
Authors/Creators
Description
This presentation is about the development of a semantic data model for the analysis of prehistoric human footprints found in caves. The initial data set consisted of metadata from 156 human footprints discovered in a cave in France. The primary goal is to prepare the research data documented in Excel in a sustainable way paying heed to the FAIR principles. Further data sets of footprint analyses are to be integrated later on in the project.
Despite being machine-readable, the current Excel data lacks the necessary semantic context and interconnection, which limits the ability to derive meaningful information from the extensive data set of approximately 7800 single facts (cells). To overcome these challenges, the intent is to semantically enrich and interlink the data in order to be processed ensuring its accessibility and reusability.
The analytical approach combines two methodologies for knowledge generation: a morpho-metric approach that uses quantitative analysis to assess footprint characteristics, and a morpho-classificatory approach that incorporates qualitative conclusions from indigenous track readers. These experts provide valuable information about the characteristics of the footprints, such as the sex, age, weight, body postures and activities of the individuals who created them. Even events involving several participants having the same intention in their activity could have been concluded.
The Excel sheet includes about 50 columns detailing various parameters relevant to the footprints, including topographical specifications, identification metrics, and correlations to specific activities. To establish a robust data model, the ontology CIDOC CRM (version 7.1.3) has been employed. This ontology is meant to represent cultural objects and phenomena of the existing world and helps to integrate heterogeneous cultural heritage information (https://cidoc-crm.org/ ). The CIDOC CRM implementation as a formal language in OWL guarantees its machine readability. For the data model, concepts of the extensions of the Scientific Observation Model (CRMsci) and CRMarchaeo were added. In case of need for even more specific issues, new classes and properties were built in a project-specific ontology (prefix “cora”).
The software WissKI (https://wiss-ki.eu ) was developed to bring semantics and research data together, archiving it adequately. With its core functionality, the pathbuilder, it is capable to build, configure and group the semantic paths which describe the data in all necessary details in a network. Based on the Content Management System Drupal, every path has a corresponding field (bundled in masks) as an output, where the data is displayed. According to standard, semantic data is stored as triples following the RDF syntax. Every triple has a unique ID (WissKI URI). This way, data and semantics are stored at the same place and can be approached in a distinct manner. Export formats from the triple store are among others JSON, NQUAD or TURTLE. Since WissKI is a web-based application, the accessibility is basically ensured. It depends then on the datas owner to grant and manage the data insight with the faceted user management brought by the underlying CMS.
As a main step in the development of a data model, key concepts had to be defined representing the core issues of the project including its context. The footprint as the initial point of the analysis is represented by instances of the class cora:Spoor, a subclass of crmsci:S20_Rigid_Physical_Feature. With the intention to represent a real world phenomenon we refer to the footprint as a materially visible physical feature with a relatively stable and invariant form. To represent the topographical context of a footprint, subclasses of the CIDOC CRM class E27_Site were built. Semantically described in a part of relation, a cora:Spoor is located in (cidoc-crm:P59i) a cora:Spatial_Unit forms part of (cidoc-crm:P46i) a cora:Cave.
Documenting research results requires documenting their generation. A crucial aspect of modeling with CIDOC CRM is representing reality through events, here mostly understood as attribute assignments (cidoc-crm:E13_Attribute_Assignments) that link information from knowledge generation to the research objects. Measures like assigning an ID to a footprint (cora:ID_Assignment), measuring its size and distance to other footprints (cidoc-crm:E16_Measurement and cora:Distance_Measurement), determining its status of preservation (cidoc-crm:E14_Condition_Assessment), making statements about its generation including deductions to its human creator and his activities (crmsci:S5_Inference_Making, see https://cidoc-crm.org/extensions/crmsci/html/CRMsci_v2.0.html#S5 ) are represented via subclasses of cidoc-crm:E13_Attribute_Assignment. The research outcome of the Inference Making represents the statements taken by consensus by the track readers (subject, activity, event and trackway).
The overall analyzing measures of each footprint are bundled by an instance of the class crmsci:S4_Observation. It “comprises the activity of gaining scientific knowledge about particular states of physical reality through empirical evidence, experiments and measurements” (see https://cidoc-crm.org/extensions/crmsci/html/CRMsci_v2.0.html#S4 ). All Observations analyzing single footprints take place within the archaeological project (crmarchaeo:S4_Observation -> cidoc-crm:P10_falls_within-> cora:archaeo:A9_Archaeological_Excavation).
The data model is adaptable and extendible for research data focused on the documentation and interpretation of prehistoric footprints in caves. The data stored in semantic triples can be easily integrated into various data pools, regardless of their underlying structures. It also supports scenarios where multiple archaeological projects study the same footprints. A key issue is the lack of unique identifiers for footprints; establishing a UID system is crucial to ensure that different projects can accurately refer to the same footprint. WissKI offers a non-semantic URI for unique identification of each node, enabling distinct access to individual footprints and their observations.
See also this published dataset of the data used for the developement of the data model.
Files
caa2026_lalbers20260331.pdf
Files
(3.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:364af73124068c7ca248c669032924c5
|
3.4 MB | Preview Download |
Additional details
Related works
- Describes
- Dataset: 10.5281/zenodo.17725251 (DOI)