Archiving a TEI project FAIRly
Description
The Inscriptions of Israel Palestine Project is an online corpus of inscriptions from Israel and Palestine, written in Hebrew, Greek, Latin and Aramaic, dating roughly from the Persian Period to the Arab Conquest. As of spring 2019, it has collected and encoded more than 4000 inscriptions, out of some 10000 relevant texts: we aim to create an exhaustive and easily accessible collection and to enable users to carry out a variety of searches and extensive textual analysis.
The FAIR Principles aim to enhance the ability of machines to automatically find and use digital objects, in addition to supporting their reuse by individuals. The principles are organized under four areas intended to ensure digital objects are findable, accessible, interoperable, and re-usable. Following epigraphy.info’s mission statement we are applying the FAIR Principles to guide our development of archival formats and processes for our corpus.
As IIP prepared to deposit files in the Brown Digital Repository, we defined formats for ensuring that our files will be as informative, self-documenting and re-usable as possible. Each inscription is contained in a single, XML file, encoded in the well-documented Epidoc subset of the TEI. These files, however, linked to externally maintained controlled vocabularies (using the xi:include feature) and bibliography (using Zotero), in order to facilitate the work of our encoders and ensure consistency. One of our challenges was to incorporate these external data into the robust, stand-alone, archival format.
We will introduce the FAIR Guiding Principles and FAIR Metrics as they apply to epigraphic corpora and TEI encoding, discuss the roadmap for implementation, and look at archival practices beyond FAIR when it comes to preservation of data as well as re-use. While the first steps to making a digital corpus findable and accessible seem straightforward—IIP texts have been ingested into the Brown Digital Repository, have unique and persistent identifiers, rich metadata, and are freely available, we can still improve on both facets. Simple interoperability and re-usability are available through the IIP API in both the production and the archival versions of the corpus, however, it will be important to do further work on controlled vocabularies, shared concepts, and encoding practices in order to enhance both of these facets.
Files
TEI-FAIR-IIP-Slides-20191022.pdf
Files
(22.8 MB)
Name | Size | Download all |
---|---|---|
md5:9de74bb6f609c3654ff8832bf63f9459
|
22.8 MB | Preview Download |