Published November 21, 2025 | Version 1.1.0
Dataset Open

BALT: Babylonian Administrative and Legal Texts

  • 1. ROR icon University of Helsinki
  • 1. ROR icon University of Helsinki
  • 2. EDMO icon University of California, Berkeley

Description

This repository contains automatically lemmatized Babylonian cuneiform texts from the Neo-Babylonian, Persian, and Hellenistic periods (c. 626-93 BCE). More than half of the transliterated texts are legacy data of the late János Everling, who was one of the pioneers in making transliterated cuneiform texts available online. The other texts have been transliterated by Johannes Hackl, Bojana Janković, Michael Jursa, Yuval Levavi, Martina Schmidl, and Caroline Waerzeggers, whom we thank for the permission to publish their texts online. We have converted the transliterations into Oracc atf, and we are naturally responsible for any errors introduced into the transliterations during the conversion. Finally, we thank Niek Veldhuis and Heidi Jauhiainen for their help at various stages of the project, especially for writing a couple of useful scripts for data manipulation.

The texts have been automatically lemmatized at the Centre of Excellence in Ancient Near Eastern Empires (University of Helsinki), funded by the Research Council of Finland (decision numbers 298647, 330727, and 352747). Linda Leinonen, Matias Sakko, Senja Salmi, and Repekka Uotila assisted in cleaning the data and creating metadata. The texts are also available on Korp (http://urn.fi/urn:nbn:fi:lb-2025022609). Korp allows extensive searches on the texts and presents the results as a KWIC concordance list. It also offers statistical information on the search results and enables the user to download them.

During the project, we have created and gathered some basic metadata for all the texts in our corpus. We have used the existing metadata on CDLI (https://cdli.earth/) as our point of departure. We created new CDLI entries for 976 tablets and updated some 280 existing entries. Some CDLI metadata was created using data from the NaBuCCo project (https://nabucco.acdh.oeaw.ac.at/). We thank Kathleen Abraham, Michael Jursa, and Shai Gordin for giving us access to the NaBuCCo metadata.

The zip file BALT contains the annotated texts and the file Scripts contains some Python and Java scripts used for converting file formats and manipulating transliterations. 

For further information on the dataset, see Alstola, T., Sahala, A., Valk, J., & Ong, M. (2026). Semi-Automatic Annotation of Babylonian Cuneiform Texts. Journal of Open Humanities Data, 12(41). https://doi.org/10.5334/johd.494

Files

BALT.zip

Files (1.7 MB)

Name Size Download all
md5:595a70f55011592cfad3acad3fced0ec
1.7 MB Preview Download
md5:a310af0b1949e12bf6cfd44b9c2e7648
10.0 kB Preview Download

Additional details

Related works

Is documented by
Journal article: 10.5334/johd.494 (DOI)
Is source of
Dataset: 10.5281/zenodo.15355780 (DOI)
Dataset: 10.5281/zenodo.15496287 (DOI)
Is supplement to
Model: 10.5281/zenodo.14978872 (DOI)
Dataset: 10.5281/zenodo.14223709 (DOI)

Funding

Research Council of Finland
Semantic domains in Akkadian texts 298647
Research Council of Finland
Empire and Village: Imperial Control Strategies and Local Responses in the Babylonian Countryside 330727
Research Council of Finland
Centre of Excellence in Ancient Near Eastern Empires / Consortium: ANEE 336673

Software

Programming language
Python , Java