There is a newer version of the record available.

Published June 26, 2018 | Version v0.7
Software Open

bootphon/wordseg: wordseg-0.7

  • 1. Bootphon Project
  • 2. CNRS

Description

  • Added tools/wordseg-qsub.sh, a script to schedule a list of segmentation jobs to a cluster running Sun Grid Engine and the qsub scheduler.

  • Added example phonological rules and updated contributong guide in documentation.

  • In wordseg-prep ignore empty lines in both gold and segmented texts.

  • In wordseg-syll the syllabification is improved: syllabification of words with no vowel, better error messages (see #35, #36).

  • In wordseg-tp add of the mutual information dependancy measure. In the bash command, the argument --probability {forward,backward} is replaced by --dependency {ftp,btp,mi} (maintained for backward compatibility). See #40.

  • In wordseg-ag:

    • niteration is now 2000 by default (was 100),
    • improved log of iterations with -vv,
    • refactored postprocessing code:

      • parallelized
      • constant memory usage (was linear wrt niterations*nutts)
      • tree to words conversion in C++ instead of Python
      • temporary parses file is now gziped (gains a factor of 20 in disk usage)
      • new --temdir option to specify another path for tempfile (default is /tmp)
      • detection of incomplete parses (if any issues a warning)
      • better comments in code, more unit tests

Files

bootphon/wordseg-v0.7.zip

Files (266.2 kB)

Name Size Download all
md5:71414eb7e2de12c630a9fd48817c70a1
266.2 kB Preview Download

Additional details

Related works