Published October 1, 2025 | Version v1
Conference paper Open

Integrating ATR Software with University HPC Infrastructure: balancing diverse compute needs

  • 1. ROR icon Princeton University

Description

There is increasing interest in automated text recognition (ATR) tools from faculty and students in the humanities and social sciences, as well as from university library professionals. Further, there is interest in the ability to train or fine-tune such machine learning models because out-of-the-box tools often return subpar results for historical or otherwise low-resource languages. In this paper, we report on our contributions to the implementation of an open-source ATR platform (eScriptorium) on university hardware and in a Slurm-managed high-performance computing (HPC) environment. We comment on modifications required for deployment, authentication, and HPC integration, as well as on decisions made regarding code modularity and strategies to handle the diverse runtime and compute requirements of user-submitted model training tasks.

Files

us-rse_htr2hpc.pdf

Files (605.8 kB)

Name Size Download all
md5:41c37cedb644b2259bdeca4f7ddaa8e5
605.8 kB Preview Download

Additional details

Software

Repository URL
https://github.com/Princeton-CDH/htr2hpc/
Programming language
Python