Neo-Babylonian Model for BabyLemmatizer 2.1
Authors/Creators
Description
This repository contains a Neo-Babylonian model for Aleksi Sahala's BabyLemmatizer 2.1. This model is used for lemmatizing and POS-tagging the texts published as Linguistically Annotated Achemenet Babylonian Texts and BALT: Babylonian Administrative and Legal Texts. The training data consists of first-millennium Babylonian texts from Oracc.
The research project has been carried out at the Centre of Excellence in Ancient Near Eastern Empires (University of Helsinki), funded by the Research Council of Finland (decision numbers 298647, 330727, and 352747).
For further information on the dataset, see Alstola, T., Sahala, A., Valk, J., & Ong, M. (2026). Semi-Automatic Annotation of Babylonian Cuneiform Texts. Journal of Open Humanities Data, 12(41). https://doi.org/10.5334/johd.494.
Files
lbach0.zip
Files
(244.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6d9d86a9e23503dab4b7ba00341d87ec
|
244.4 MB | Preview Download |
Additional details
Related works
- Is documented by
- Journal article: 10.5334/johd.494 (DOI)
- Is supplement to
- Software: https://github.com/asahala/BabyLemmatizer (URL)
- Dataset: 10.5281/zenodo.14223709 (DOI)
- Dataset: 10.5281/zenodo.14186072 (DOI)
Funding
- Research Council of Finland
- Semantic domains in Akkadian texts 298647
- Research Council of Finland
- Empire and Village: Imperial Control Strategies and Local Responses in the Babylonian Countryside 330727
- Research Council of Finland
- Centre of Excellence in Ancient Near Eastern Empires / Consortium: ANEE 336673
Software
- Repository URL
- https://github.com/asahala/BabyLemmatizer
- Programming language
- Python
- Development Status
- Active