There is a newer version of the record available.

Published January 7, 2022 | Version v0.5.0
Software Open

BigDataBiology/SemiBin: Version 0.5.0

  • 1. ISTBI, Fudan university
  • 2. Fudan University

Description

Version 0.5

Released January 7 2022

User-visible improvements
  • Reclustering is now the default (use --no-recluster to disable it; the option --recluster is deprecated and ignored) as the computational costs are much lower
  • GTDB lazy downloading is now performed even if a non-standard directory is used
  • The CACHEDIR.TAG protocol was implemented (this is supported by several tools that perform tasks such as backups).
Bugfixes
  • Fix bug with --min-len (minimal length). Previously, only contigs greater than the given minimal length were used (instead of greater-equal to the minimal length).
  • GTDB downloading was inconsistent in a few instances which have been fixed
Internal improvements
  • Much more efficient code (including lower memory usage) for binning, especially if a pretrained model is used. As an example, using a deeply-sequenced ocean sample, generating the data (generate_data_single step) goes down from 14 to 9 minutes; while binning (bin step, using --recluster) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33 (using 4.5 GB, at peak). Thus total time from BAM file to bins went down from 25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it usable on a typical laptop.

Files

BigDataBiology/SemiBin-v0.5.0.zip

Files (22.1 MB)

Name Size Download all
md5:5aa73978f2e5445f5a80f4bf9a0f69de
22.1 MB Preview Download

Additional details

Related works