Published October 13, 2022
| Version 14-7e284
Software
Open
soedinglab/MMseqs2: MMseqs2 Release 14-7e284
Authors/Creators
- Milot Mirdita1
- Martin Steinegger2
- larsdriesch
- ClovisG3
- Eli Levy Karin4
- RuoshiZ
- Annika Jochheim5
- Clovis Norroy
- Hans-Georg Sommer
- Florian Breitwieser
- Hayden Hyunjoo Ji
- Johannes Soeding6
- Michael R. Crusoe7
- Shyam Saladi8
- Antonio Fernandez-Guerra
- Benjamin Lee
- Huan Fan9
- Luiz Irber10
- Mark Wilson
- Silas Kieser
- Tony E Lewis
- cutecutecat11
- 1. @soedinglab
- 2. Seoul National University
- 3. LJK-GINP
- 4. Max-Planck Institute for biophysical Chemistry
- 5. Max Planck Institute
- 6. Max-Planck institute for biophysical chemistry
- 7. @common-workflow-language
- 8. @clemlab, Caltech
- 9. University of Wisconsin - Madison
- 10. @10xgenomics
- 11. Southern University of Science and Technology
Description
This is a major release containing features implemented for ColabFold, Foldseek, MMseqs2 profile-profile (not published yet, and still in preview) and many bugfixes. Thanks a lot to the contributors who submitted bug fixes.
If you are using the Docker Hub based MMseqs2 containers, please switch to the new Github Container Registry based ones. The Docker Hub containers will not be maintained in the future.
Breaking- Profile k-mer threshold parameter were fitted to new pseudo-counter parameter (
--pca,--pcb). Previous--k-scoreparameters will have differing sensitivity. However, most users will have set-sinstead, which was fitted to match as closely as possible.
gff2dbnow should actually work correctly after refactoring (488df863, thanks @RuoshiZhang)result2msanow supports reading from precomputed index- Add
db2tar: Create a tar file from a database - Add parsable columnar tsv output to
databaseswith--tsv - Add taxonomic filtering during
prefilterwith--taxon-list - Add
--comp-bias-corr-scaleto adjust the weight of the compositional bias correction - Add
--mask-probparameter to adjust tantan's masking threshold - Add context specific pseudo-counts to
result2profile - Add iterative profile-profile search workflow (thanks @haydenji0731)
- Add support for profile-profile scoring in striped Smith-Waterman algorithm (thanks @haydenji0731)
- Add support for gap-open/gap-close costs to striped Smith-Waterman algorithm (thanks @hgsommer)
- Add environment variable
MMSEQS_IGNORE_INDEXto ignore an existing precomputed index createsubdband view can now return results from identifiers in.lookupwith--id-mode 1- Change
compressdbloop toomp staticto keep order - Improvements to nucleotide alignments and scoring (thanks @AnnSeidel)
- Add
pairaln: taxonomic pairing on sequences for MSA building (9a0df0d2, 5e245d17, 3f8695ea, 3e92abf7, edb8223d, e19df7ce) - Add A3M support to
result2msa(--msa-format-mode 5) - Add A3M support with alignment information to
result2msa(--msa-format-mode 6) result2profileallows--diff 0- Make taxonomy mapping mmap'able for (near) instant read-in
- Add workflow to create expandable profile (profile-profile) db from TSVs
tsv2exprofiledb - Enable
result2profile/filterresultto read new expand alignment index - Add support to filter MSAs in buckets
filterresult,result2profile - Add
--filter-min-enableto enable filtering only above a minimum threshold of hits (c6d8ae0c) - Expand can filter in each target cluster before expanding (75af0c82, 85ce8472)
summarizeresultwas rejecting hits that match the coverage threshold exactly (#586, 67949d70)- Don't use reserved filename characters in unpackdb (#467, c6634976 thanks @cutecutecat)
- Fix typo (violoations -> violations) (#526, 74c3aa65, thanks @Benjamin-Lee)
- Fix potential endless loop in
rescorediagonal - Fix prefilter/alignment with 0-size query input #433
- Fix
unpackdbparameter parsing issue - Make sure
FILTER_RESULTvariable is always correctly set for exhaustive search (d4a33542) tar2dbbreaking with--tar-include/exclude(#561)- Wrong database name printed for variadic input when creating a tmp directory
extractorfssometimes loading invalid start/stop codons on non-avx2 platforms- Don't mask consensus sequences in profiles
result2msacorrectly prints X residues- Allocate
CSProfileonly if it's going to be used (d8736973) - Taxonomy db paths are now correctly found if given a precomputed index (8ff26f23)
- Encode more strings internally as base64 if special characters are used (16b57741, d1555862)
- Disable broken iterative profile searches in taxonomy (#432)
- Fixed a possible segmentation fault in
align(thanks @rchikhi)
databases
- Added VOGDB
- Updated dbCAN2 to V9 and removed
.alnsuffix from profile names - Fix issues with ResFinder (#494, 56816b39), GTDB (#561, 678c82ac), Kalamari (#531, ce7bf53b), Uniref (#496, e85ceb9, thanks to @fanhuan)
- Rework of
result2msato avoid allocating a lot of memory - Improvement of speed for ungapped alignment in
prefilter TaxonomyExpressionis faster with a single tax identifier (8ff72796)
- MMseqs2-based subprojects can use
databasestoo (5afd33c3) - Add
appenddbtoindex: augment a precomputed index with other databases in sub-projects - Allow subprojects to build their own precomputed indices (a506d677)
- Add support for external k-mer thresholds for the prefilter (fea8d203)
- Subprojects can define their own DbType validators
- Added CirrusCI to test FreeBSD and old compilers (a2e2129c, 904d0c6d, a09a704e, 4f1996a4, 482dedc6, 16830a52)
- MMseqs2 Docker containers are now published in the Github Container Registry (eb203d35, 5185d3cb, ba4e11f1)
- Our microtar fork can write tar files again (dcd180be)
- Add URIs as allowed parameter inputs (3b9cf881)
- Additional s390x fixes (linclust might work now)
- Add support for new MultiParameter type
- Bundled SIMDe was updated (thanks @mr-c)
Files
soedinglab/MMseqs2-14-7e284.zip
Files
(13.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:12ba756605cacb9a3b7bb5fc2a1cb749
|
13.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/soedinglab/MMseqs2/tree/14-7e284 (URL)