Published January 7, 2022
| Version v0.5.0
Software
Open
BigDataBiology/SemiBin: Version 0.5.0
Description
Version 0.5
Released January 7 2022
User-visible improvements- Reclustering is now the default (use
--no-recluster
to disable it; the option--recluster
is deprecated and ignored) as the computational costs are much lower - GTDB lazy downloading is now performed even if a non-standard directory is used
- The CACHEDIR.TAG protocol was implemented (this is supported by several tools that perform tasks such as backups).
- Fix bug with
--min-len
(minimal length). Previously, only contigs greater than the given minimal length were used (instead of greater-equal to the minimal length). - GTDB downloading was inconsistent in a few instances which have been fixed
- Much more efficient code (including lower memory usage) for binning,
especially if a pretrained model is used. As an example, using a
deeply-sequenced ocean sample, generating the data (
generate_data_single
step) goes down from 14 to 9 minutes; while binning (bin
step, using--recluster
) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33 (using 4.5 GB, at peak). Thus total time from BAM file to bins went down from 25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it usable on a typical laptop.
Files
BigDataBiology/SemiBin-v0.5.0.zip
Files
(22.1 MB)
Name | Size | Download all |
---|---|---|
md5:5aa73978f2e5445f5a80f4bf9a0f69de
|
22.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/BigDataBiology/SemiBin/tree/v0.5.0 (URL)