Integrating ATR Software with University HPC Infrastructure: balancing diverse compute needs
Description
There is increasing interest in automated text recognition (ATR) tools from faculty and students in the humanities and social sciences, as well as from university library professionals. Further, there is interest in the ability to train or fine-tune such machine learning models because out-of-the-box tools often return subpar results for historical or otherwise low-resource languages. In this paper, we report on our contributions to the implementation of an open-source ATR platform (eScriptorium) on university hardware and in a Slurm-managed high-performance computing (HPC) environment. We comment on modifications required for deployment, authentication, and HPC integration, as well as on decisions made regarding code modularity and strategies to handle the diverse runtime and compute requirements of user-submitted model training tasks.
Files
us-rse_htr2hpc.pdf
Files
(605.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:41c37cedb644b2259bdeca4f7ddaa8e5
|
605.8 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Princeton-CDH/htr2hpc/
- Programming language
- Python