Published March 6, 2025 | Version 1.0.0
Model Open

Neo-Babylonian Model for BabyLemmatizer 2.1

Description

This repository contains a Neo-Babylonian model for Aleksi Sahala's BabyLemmatizer 2.1. This model is used for lemmatizing and POS-tagging the texts published as Linguistically Annotated Achemenet Babylonian Texts and BALT: Babylonian Administrative and Legal Texts. The training data consists of first-millennium Babylonian texts from Oracc.

The research project has been carried out at the Centre of Excellence in Ancient Near Eastern Empires (University of Helsinki), funded by the Research Council of Finland (decision numbers 298647, 330727, and 352747).

For further information on the dataset, see Alstola, T., Sahala, A., Valk, J., & Ong, M. (2026). Semi-Automatic Annotation of Babylonian Cuneiform Texts. Journal of Open Humanities Data, 12(41). https://doi.org/10.5334/johd.494

Files

lbach0.zip

Files (244.4 MB)

Name Size Download all
md5:6d9d86a9e23503dab4b7ba00341d87ec
244.4 MB Preview Download

Additional details

Related works

Is documented by
Journal article: 10.5334/johd.494 (DOI)
Is supplement to
Software: https://github.com/asahala/BabyLemmatizer (URL)
Dataset: 10.5281/zenodo.14223709 (DOI)
Dataset: 10.5281/zenodo.14186072 (DOI)

Funding

Research Council of Finland
Semantic domains in Akkadian texts 298647
Research Council of Finland
Empire and Village: Imperial Control Strategies and Local Responses in the Babylonian Countryside 330727
Research Council of Finland
Centre of Excellence in Ancient Near Eastern Empires / Consortium: ANEE 336673

Software

Repository URL
https://github.com/asahala/BabyLemmatizer
Programming language
Python
Development Status
Active