Challenges in Encoding Fragmentary Attested Languages

Marinetti, Anna; Murano, Francesca; Quochi, Valeria; Ballerini, Monica; Boschetti, Federico; Del Grosso, Angelo Mario; Piccini, Silvia; Rigobianco, Luca; Solinas, Patrizia; Zinzi, Mariarosaria; Mallia, Michele; Middei, Edoardo

doi:10.5281/zenodo.7876475

Published August 22, 2022 | Version v1

Poster Open

Challenges in Encoding Fragmentary Attested Languages

1. Università Ca' Foscari di Venezia
2. Università degli Studi di Firenze
3. Istituto di Linguistica Computazionale "A. Zampolli" - CNR

The ItAnt project investigates the languages of ancient Italy, whose only attestations consist in epigraphic evidence, focusing on Venetic, Oscan, Faliscan and Celtic languages.

For this purpose, the project combines the traditional method proper to historical linguistics with the setting up of digital technologies, developing computational tools specifically designed to create a digital set of interrelated resources.

1st Challenge: TEI/Epidoc Encoding

Inscriptions are collected in a digital corpus managed in a digital archive containing the formal representation of the texts leveraging the TEI/EpiDoc encoding schema. It was necessary to extend the schema, in order to encode the peculiarities of these texts, such as odd writing ductus, and to describe the information in more detail, such as a division of information on language and script.

The archive is enriched with standard metadata describing linguistics and material information.

ItAnt is experimenting with Domain-Specific Languages to deploy a system that can assist scholars in the creation of the textual digital resources.

2nd Challenge: Describing the Lexicon

ItAnt is developing a multilingual computational lexicon, providing a structured and formal representation of the lexical items and their related information as well as for allowing for a semantic access to the corpus. Traditional methods of lemmatisation do not lend themselves to fragmentary attested languages, in many cases the relation between words and lemmas being difficult for various reasons, such as different graphic standards, difficult linguistic analysis, incomplete paradigms, etc. Furthermore, sense representation is problematic, since meanings can often be reconstructed only partially and hypothetically.

3rd Challenge: Applying Ontological Model

ItAnt is testing the use of CRMinf and CRMtex extensions of CIDOC CRM, the de facto standard ontology in the Digital Humanities for the representation of the texts and their scientific interpretation in a semantic format. This is the first experiment of complete treatment of the information concerning epigraphic material through this ontology.

4th Challenge: Integrating Data

Finally, ItAnt will interlink among the different datasets, creating a hub that will primarily integrate lexicon and epigraphy transcriptions, together with contextual metadata, bibliography, and, experimentally, the hermeneutic positions.

5th Challenge: Preservation

Tools and resources produced and developed within the project will be made available through relevant European-wide Research Infrastructures, such as CLARIN and DARIAH. This will ensure both a long-term preservation of the resources produced and a high valorisation of this heritage.

Notes

The paper is supported by the Ministero dell'Università e della Ricerca, Italy [PRIN 2017XJLE8J]

Files

poster_CIEGL_2022.pdf

Files (11.6 MB)

Name	Size	Download all
poster_CIEGL_2022.pdf md5:31c9edcab4edca1870244dd3b0dabfc0	11.6 MB	Preview Download

	All versions	This version
Views	88	88
Downloads	87	87
Data volume	1.0 GB	1.0 GB

Challenges in Encoding Fragmentary Attested Languages

Authors/Creators

Description

Notes

Files

poster_CIEGL_2022.pdf

Files (11.6 MB)