Published January 10, 2026 | Version v2
Dataset Open

ArmEpiC – Armenian Epigraphic Corpus (ArtsakhEpiC Sub-Corpus, v1.0)

  • 1. ROR icon Formation Continue UNIL-EPFL
  • 2. ROR icon Dragomanov Ukrainian State University
  • 3. Yerevan Brusov State University of Languages and Social Sciences
  • 4. Institute of Archeology and Ethnography of the NAS RA
  • 5. Mesrop Mashtots Institute of Ancient Manuscripts,
  • 6. ROR icon École Polytechnique Fédérale de Lausanne

Contributors

  • 1. ROR icon École Polytechnique Fédérale de Lausanne

Description

ArmEpiC: Methodology and Data Description 

Abstract

ArmEpiC (Armenian Epigraphic Corpus) is a digital scholarly dataset comprising diplomatically transcribed Armenian lapidary inscriptions encoded in TEI/EpiDoc (v9.7), together with a system of authority files designed to preserve epigraphic evidence while enabling analytical interoperability. The dataset is intended for reuse by epigraphers, historians, linguists, and digital heritage researchers requiring transparent, machine-readable epigraphic data.

Scope of the Dataset

The Zenodo deposit includes ten TEI/EpiDoc inscription files, authority files (ListPlace, ListMonument, ListSubMonument, ListMaterial, ListPreservation, ListScript, ListAbbreviationType, ListChronology, ListBibl), this methodology document, a README, and a licensing statement.

Conceptual Separation of Evidence and Interpretation

ArmEpiC enforces a strict separation between epigraphic evidence, editorial observation, and interpretive layers. The diplomatic transcription constitutes the primary evidentiary layer; all analytical and interpretive interventions are explicitly encoded and remain reversible.

Diplomatic Transcription Policy

Original orthography is preserved, lineation follows the stone, and no silent normalization is introduced. Editorial intervention is restricted to explicit expansion of abbreviations, explicit supply of omitted letters, and explicit marking of damage or loss.

Graphic Phenomena and Linguistic Structure

Ligatures are treated as graphic phenomena and do not determine linguistic segmentation. Ligatures across word boundaries are encoded graphically while preserving separate lexical units.

Abbreviations and Omitted Letters

A strict distinction is maintained between abbreviations (intentional and conventional) and omitted letters (context-driven loss). Ambiguous cases are flagged rather than silently resolved. Honorific and graphic abbreviations are distinguished analytically via a controlled vocabulary.

Word Segmentation and Lemmatization

Each lexical unit is encoded as an independent word. Lemmatization is an analytical layer supplied in normalized Classical Armenian and does not imply correction of the original spelling.

Names, Prosopography, and Places

Personal names are encoded structurally without imposing prosopographic identification. Place names are preserved as attested and linked to external authorities via ListPlace.

Dating and Chronology

Dates are recorded as transmitted in the inscription, with Gregorian equivalents supplied as scholarly interpretation. The evidentiary basis of each date is made explicit.

Functional Classification

Each inscription is assigned a single dominant functional category as a heuristic analytical label.

Translation Strategy

Translations into Modern Armenian and English are provided as interpretive aids, prioritizing semantic accuracy. They do not replace the original text.

Authority Files

Each authority entity is assigned a persistent URN that is immutable once published. Authorities are aligned conceptually with international vocabularies to support interoperability.

XML Structure and Validation

All XML files were validated using the official TEI/EpiDoc 9.7 Relax NG and Schematron schemas with standard XML validation tools prior to Zenodo deposition. All xml:id values conform to NCName constraints.

Licensing and Versioning

The dataset is released under a Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license. This Zenodo deposit represents a fixed release; future revisions will receive new DOIs.

Conclusion

ArmEpiC provides a transparent, reversible, and interoperable digital epigraphic dataset grounded in Armenian scholarly tradition and international standards, enabling analytical reuse across disciplines.

The project has been funded by the National Association for Armenian Studies and Research (NAASR) and the Knights of Vartan Fund for Armenian Studies. 

 

*ArmEpiC (Armenian Epigraphic Corpus) is a scholarly research project initiated and curated under the chief editorship of Hamest Tamrazyan, with Gayane Hovhannisyan and Arsen Arutyunyan as editors.

*ArmEpiC is an evolving corpus. Authority files, identifiers, and encoding practices may be refined between versions.

Files

ArmEpiC_ListAbbreviationType_crm.xml

Files (24.0 MB)

Name Size Download all
md5:aeca85170b30fae5d9b9f14ff9edf9d2
5.6 kB Preview Download
md5:8095cce2565ccfb6994f70d481ae3a4a
10.1 kB Preview Download
md5:f4095cfa16b3cde9741c58e049bd1ce0
14.2 kB Preview Download
md5:ba302438178fc4580e0a8ff72e26e2c5
6.1 kB Preview Download
md5:d58f9b9130a0cff178a9d0a88f520358
4.9 kB Preview Download
md5:bd4d654252e43cf73cc415c448e8a7f2
9.5 kB Preview Download
md5:ff442b61c14b1272a6f47b45c1f34fc9
15.3 kB Preview Download
md5:46b8702532253321327b7eab3b444788
13.4 kB Preview Download
md5:5dd117b2777e215bf025acd837848834
19.8 kB Preview Download
md5:49f8fe9e2c68bba5d344958652d975ab
18.8 kB Preview Download
md5:f545bf88dab1de9b2d9764c6ef110117
5.4 kB Preview Download
md5:6b8e6541b09aa8a80a5ffae3428f997f
19.8 kB Preview Download
md5:971f3ec0fe2b80d5bde696fc624efaa3
20.7 kB Preview Download
md5:fec436add50454cfbee2ccd3f237af71
15.5 kB Preview Download
md5:dfa5852bce73e3961cb204f289dfe6cf
16.5 kB Preview Download
md5:de9ecfb1e0b9e509e0f0c9e59cbc636a
15.6 kB Preview Download
md5:95352fe03c9c1f3aaa5de9554249c635
21.9 kB Preview Download
md5:9c19f5c3c5bafb73fed16496a3f8ebae
17.7 kB Preview Download
md5:88276b39970a6bc325521cb4606102aa
12.9 kB Preview Download
md5:9996027258d93f54884136ab574aaf62
12.9 kB Preview Download
md5:bc5edcb009640b57e0707d3e25ee413b
13.6 kB Preview Download
md5:4caefa5afe12f947e56152c6e92df8e2
5.6 MB Preview Download
md5:2385e1bcc6e376e71072c565b860a7f5
3.5 MB Preview Download
md5:7af68ec3d586cc96a8278e23c76b004f
3.5 MB Preview Download
md5:60d021ce624a9d4a1ad7ddf7cdd89a8c
1.3 MB Preview Download
md5:cafd8e25193bd1004ad53e483fa8d0ab
3.4 MB Preview Download
md5:d702b82743b50b6585f2afad2bef2d4d
1.6 MB Preview Download
md5:7e07928704049237d177347fec461332
3.4 MB Preview Download
md5:c7033fc1d2f70eca41595d016180d06d
244.4 kB Preview Download
md5:32337929c036b560a129713fa7e66408
1.1 MB Preview Download
md5:3a7ea6c1e84a855327b6b2371ae405e2
65.0 kB Preview Download

Additional details

Dates

Available
2026-01-09

References

  • Tamrazyan, Hamest; Hovhannisyan, Gayane; Harutyunyan, Arsen; Boros, Emanuela. (2025). ArmEpiC – Armenian Epigraphic Corpus (ArtsakhEpiC Sub-Corpus, v1.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.18198118