Published November 29, 2022
| Version v0.3.4
Software
Open
Living-with-machines/alto2txt
Authors/Creators
- 1. The Alan Turing Institute
- 2. EPCC, The University of Edinburgh
- 3. British Library
- 4. The Alan Turing Institute; RiCEP, Academy of Finland
Description
alto2txt: Extract plain text from newspapers
Converts XML (in METS 1.8/ALTO 1.4, METS 1.3/ALTO 1.4, BLN orUKP format) publications to plaintext articles and generates minimal metadata.
Full documentation and demo instructions.
Added
- Added
PyPIversion andMITlicense badges toREADME.md - Added
pytest-covwith default options to assess documentation - Added
isortto.pre-commit-config.yamlto sort import consistency - Added
pyclnto.pre-commit-config.yamlto check unused imports - Added
pyclnconfiguration topyproject.toml - Added
alto2txtas a command line script inpyproject.toml
Changed
- Switch from
Apache v2.0license toMITlicense, inline with project recommendations. - Updated
mypyin.pre-commit-config.yaml
Deprecated
- Replace
extract_publications_text.pywith thealto2txtcommand line interfacescript specified inpyproject.toml
Removed
setup.pyrequirements.txt
Fixed
- Fixed
python = ">3.6.0"inpyproject.tomlrather than>3.7for consistency with documentation - Fixed licensing ambiguity (now all should be
MIT) - Fixed typos in
README.md - Fixed surperflous imports via
pyclninpre-commit
Files
Living-with-machines/alto2txt-v0.3.4.zip
Files
(1.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:948ce84fac76ea3d50ccfe202633b38b
|
1.0 MB | Preview Download |
Additional details
Related works
- Is documented by
- Software documentation: https://living-with-machines.github.io/alto2txt (URL)
- Is supplement to
- Software: https://github.com/Living-with-machines/alto2txt/tree/v0.3.4 (URL)
Funding
- UK Research and Innovation
- Living with Machines AH/S01179X/1