bootphon/wordseg: wordseg-0.7
Description
Added
tools/wordseg-qsub.sh
, a script to schedule a list of segmentation jobs to a cluster running Sun Grid Engine and theqsub
scheduler.Added example phonological rules and updated contributong guide in documentation.
In wordseg-prep ignore empty lines in both gold and segmented texts.
In wordseg-syll the syllabification is improved: syllabification of words with no vowel, better error messages (see #35, #36).
In wordseg-tp add of the mutual information dependancy measure. In the bash command, the argument
--probability {forward,backward}
is replaced by--dependency {ftp,btp,mi}
(maintained for backward compatibility). See #40.In wordseg-ag:
- niteration is now 2000 by default (was 100),
- improved log of iterations with
-vv
, refactored postprocessing code:
- parallelized
- constant memory usage (was linear wrt niterations*nutts)
- tree to words conversion in C++ instead of Python
- temporary parses file is now gziped (gains a factor of 20 in disk usage)
- new --temdir option to specify another path for tempfile (default is /tmp)
- detection of incomplete parses (if any issues a warning)
- better comments in code, more unit tests
Files
bootphon/wordseg-v0.7.zip
Files
(266.2 kB)
Name | Size | Download all |
---|---|---|
md5:71414eb7e2de12c630a9fd48817c70a1
|
266.2 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/bootphon/wordseg/tree/v0.7 (URL)