There is a newer version of the record available.

Published January 16, 2023 | Version v3.0.0
Software Open

jtnystrom/Discount: Version 3.0.0

Authors/Creators

  • 1. JNP Solutions, Lifematics

Description

This version adds indexes (k-mer databases with counted k-mers) and ways to combine these, including intersect, union and subtract. Various rules for intersection and union are available, including max, min, left, right. Most operations that could formerly be done only on raw sequence files can now also be done on indexes with a similar syntax. Indexes are stored using bucketed parquet files, which gives good efficiency when using the same input data multiple times, as the k-mers do not have to be shuffled again during subsequent use.

Indexes can be manipulated using the command-line interface as well as the API from notebooks or from the Spark shell.

Other improvements include:

  • Support for indexes (k-mer databases) written as parquet files.
  • Index operations such as union, intersect, subtract, with various combination rules like min, max, sum, left, right.
  • Restructured the API to use indexes as much as possible.
  • Several operations were moved to Spark SQL (from handcrafted Scala) for performance and simplicity.
  • Run scripts were renamed and can now detect their location, which makes it easy to symlink them to somewhere in $PATH.
  • The new -p flag is now the preferred way to specify the number of partitions.
  • Most commands that take input can now read input from an index (using -i) as well as from sequence files.
  • K-mer counts are now consistently represented as Int instead of Long in the user API as they were limited to 32-bit signed integers internally.
  • Added com.globalmentor's hadoop-bare-naked-local-fs to avoid dependency on winutils.exe on Windows when running tests.
  • Various simplifications and speedups.

Files

jtnystrom/Discount-v3.0.0.zip

Files (13.3 MB)

Name Size Download all
md5:e19f03d798f399ca9acfd6cecdb37547
13.3 MB Preview Download

Additional details

Related works