Published July 22, 2022
| Version v0.25
Software
Open
LanguageMachines/ucto: v0.25
Authors/Creators
- 1. Radboud University
- 2. KNAW Humanities Cluster & CLST, Radboud University
Description
[Ko van der Sloot]
- Added a test for https://github.com/LanguageMachines/ucto/issues/87
- Adapted to latest update in tokconfig-fra (uctodata 0.9)
- Deal with unknown languages (as detected by ucto), using iso-639-3 'und' (https://github.com/LanguageMachines/ucto/issues/86)
- don't tokenize unknown languages
- configurable sentence splitter for "und" text
- added tests
- added code to set the separator (--seperators), so ucto can split on more than just spaces
- migrated test wrapper to Python 3 (was still on 2.7)
[Maarten van Gompel]
- Set up a Dockerfile
- Added build-deps.sh to automatically download, build and install dependencies
- Updated software metadata (codemeta.json) to latest requirements as proposed in CLARIAH
- deprecated options -f and -x, still works but no longer advertised and gives a deprecation notice (https://github.com/LanguageMachines/ucto/issues/88)
- textcat.cfg is now searched for in user config dir as well as global config; also allow running without textcat if the config is missing entirely (same as if not compiled in)
- added support for user-based configuration dirs ($XDG_CONFIG_HOME/ucto), takes precedence over global data dirs
Files
LanguageMachines/ucto-v0.25.zip
Files
(512.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:091466ea2f1bdf9252b56d30ebd5d592
|
512.5 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/LanguageMachines/ucto/tree/v0.25 (URL)