Natas: A library for normalizing historical English
Description
Python 3 library for processing historical English.
1. Cite
If you use the library, please cite one of the following publications depending on whether you used it for normalization or OCR correction.
1.1 Normalization
Mika Hämäläinen, Tanja Säily, Jack Rueter, Jörg Tiedemann, and Eetu Mäkelä. 2019. Revisiting NMT for Normalization of Early English Letters. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature.
1.2 OCR correction
Mika Hämäläinen, and Simon Hengchen. 2019. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In the Proceedings of Recent Advances in Natural Language Processing.
Files
mikahama/natas-1.0.2.zip
Files
(134.0 MB)
Name | Size | Download all |
---|---|---|
md5:fc56b802b37bc0acd3fb888057785019
|
134.0 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/mikahama/natas/tree/1.0.2 (URL)