Published November 29, 2022
| Version v0.3.4
Software
Open
Living-with-machines/alto2txt
Creators
- 1. The Alan Turing Institute
- 2. EPCC, The University of Edinburgh
- 3. British Library
- 4. The Alan Turing Institute; RiCEP, Academy of Finland
Description
alto2txt
: Extract plain text from newspapers
Converts XML
(in METS 1.8
/ALTO 1.4
, METS 1.3
/ALTO 1.4
, BLN
orUKP
format) publications to plaintext articles and generates minimal metadata.
Full documentation and demo instructions.
Added
- Added
PyPI
version andMIT
license badges toREADME.md
- Added
pytest-cov
with default options to assess documentation - Added
isort
to.pre-commit-config.yaml
to sort import consistency - Added
pycln
to.pre-commit-config.yaml
to check unused imports - Added
pycln
configuration topyproject.toml
- Added
alto2txt
as a command line script inpyproject.toml
Changed
- Switch from
Apache v2.0
license toMIT
license, inline with project recommendations. - Updated
mypy
in.pre-commit-config.yaml
Deprecated
- Replace
extract_publications_text.py
with thealto2txt
command line interface
script specified inpyproject.toml
Removed
setup.py
requirements.txt
Fixed
- Fixed
python = ">3.6.0"
inpyproject.toml
rather than>3.7
for consistency with documentation - Fixed licensing ambiguity (now all should be
MIT
) - Fixed typos in
README.md
- Fixed surperflous imports via
pycln
inpre-commit
Files
Living-with-machines/alto2txt-v0.3.4.zip
Files
(1.0 MB)
Name | Size | Download all |
---|---|---|
md5:948ce84fac76ea3d50ccfe202633b38b
|
1.0 MB | Preview Download |
Additional details
Related works
- Is documented by
- Software documentation: https://living-with-machines.github.io/alto2txt (URL)
- Is supplement to
- Software: https://github.com/Living-with-machines/alto2txt/tree/v0.3.4 (URL)
Funding
- Living with Machines AH/S01179X/1
- UK Research and Innovation