soedinglab/MMseqs2: MMseqs2 Release 14-7e284

Milot Mirdita; Martin Steinegger; larsdriesch; ClovisG; Eli Levy Karin; RuoshiZ; Annika Jochheim; Clovis Norroy; Hans-Georg Sommer; Florian Breitwieser; Hayden Hyunjoo Ji; Johannes Soeding; Michael R. Crusoe; Shyam Saladi; Antonio Fernandez-Guerra; Benjamin Lee; Huan Fan; Luiz Irber; Mark Wilson; Silas Kieser; Tony E Lewis; cutecutecat

doi:10.5281/zenodo.7194177

Published October 13, 2022 | Version 14-7e284

Software Open

soedinglab/MMseqs2: MMseqs2 Release 14-7e284

1. @soedinglab
2. Seoul National University
3. LJK-GINP
4. Max-Planck Institute for biophysical Chemistry
5. Max Planck Institute
6. Max-Planck institute for biophysical chemistry
7. @common-workflow-language
8. @clemlab, Caltech
9. University of Wisconsin - Madison
10. @10xgenomics
11. Southern University of Science and Technology

This is a major release containing features implemented for ColabFold, Foldseek, MMseqs2 profile-profile (not published yet, and still in preview) and many bugfixes. Thanks a lot to the contributors who submitted bug fixes.

If you are using the Docker Hub based MMseqs2 containers, please switch to the new Github Container Registry based ones. The Docker Hub containers will not be maintained in the future.

Breaking

Profile k-mer threshold parameter were fitted to new pseudo-counter parameter (--pca,--pcb). Previous --k-score parameters will have differing sensitivity. However, most users will have set -s instead, which was fitted to match as closely as possible.

Features

gff2db now should actually work correctly after refactoring (488df863, thanks @RuoshiZhang)
result2msa now supports reading from precomputed index
Add db2tar: Create a tar file from a database
Add parsable columnar tsv output to databases with --tsv
Add taxonomic filtering during prefilter with --taxon-list
Add --comp-bias-corr-scale to adjust the weight of the compositional bias correction
Add --mask-prob parameter to adjust tantan's masking threshold
Add context specific pseudo-counts to result2profile
Add iterative profile-profile search workflow (thanks @haydenji0731)
Add support for profile-profile scoring in striped Smith-Waterman algorithm (thanks @haydenji0731)
Add support for gap-open/gap-close costs to striped Smith-Waterman algorithm (thanks @hgsommer)
Add environment variable MMSEQS_IGNORE_INDEX to ignore an existing precomputed index
createsubdb and view can now return results from identifiers in .lookup with --id-mode 1
Change compressdb loop to omp static to keep order
Improvements to nucleotide alignments and scoring (thanks @AnnSeidel)

Features built for ColabFold now available in MMseqs2

Add pairaln: taxonomic pairing on sequences for MSA building (9a0df0d2, 5e245d17, 3f8695ea, 3e92abf7, edb8223d, e19df7ce)
Add A3M support to result2msa (--msa-format-mode 5)
Add A3M support with alignment information to result2msa (--msa-format-mode 6)
result2profile allows --diff 0
Make taxonomy mapping mmap'able for (near) instant read-in
Add workflow to create expandable profile (profile-profile) db from TSVs tsv2exprofiledb
Enable result2profile/filterresult to read new expand alignment index
Add support to filter MSAs in buckets filterresult, result2profile
Add --filter-min-enable to enable filtering only above a minimum threshold of hits (c6d8ae0c)
Expand can filter in each target cluster before expanding (75af0c82, 85ce8472)

Bugfixes

summarizeresult was rejecting hits that match the coverage threshold exactly (#586, 67949d70)
Don't use reserved filename characters in unpackdb (#467, c6634976 thanks @cutecutecat)
Fix typo (violoations -> violations) (#526, 74c3aa65, thanks @Benjamin-Lee)
Fix potential endless loop in rescorediagonal
Fix prefilter/alignment with 0-size query input #433
Fix unpackdb parameter parsing issue
Make sure FILTER_RESULT variable is always correctly set for exhaustive search (d4a33542)
tar2db breaking with --tar-include/exclude (#561)
Wrong database name printed for variadic input when creating a tmp directory
extractorfs sometimes loading invalid start/stop codons on non-avx2 platforms
Don't mask consensus sequences in profiles
result2msa correctly prints X residues
Allocate CSProfile only if it's going to be used (d8736973)
Taxonomy db paths are now correctly found if given a precomputed index (8ff26f23)
Encode more strings internally as base64 if special characters are used (16b57741, d1555862)
Disable broken iterative profile searches in taxonomy (#432)
Fixed a possible segmentation fault in align (thanks @rchikhi)

MMseqs2 databases

Added VOGDB
Updated dbCAN2 to V9 and removed .aln suffix from profile names
Fix issues with ResFinder (#494, 56816b39), GTDB (#561, 678c82ac), Kalamari (#531, ce7bf53b), Uniref (#496, e85ceb9, thanks to @fanhuan)

Speedup

Rework of result2msa to avoid allocating a lot of memory
Improvement of speed for ungapped alignment in prefilter
TaxonomyExpression is faster with a single tax identifier (8ff72796)

MMseqs2 subprojects

MMseqs2-based subprojects can use databases too (5afd33c3)
Add appenddbtoindex: augment a precomputed index with other databases in sub-projects
Allow subprojects to build their own precomputed indices (a506d677)
Add support for external k-mer thresholds for the prefilter (fea8d203)
Subprojects can define their own DbType validators

Developers

Added CirrusCI to test FreeBSD and old compilers (a2e2129c, 904d0c6d, a09a704e, 4f1996a4, 482dedc6, 16830a52)
MMseqs2 Docker containers are now published in the Github Container Registry (eb203d35, 5185d3cb, ba4e11f1)
Our microtar fork can write tar files again (dcd180be)
Add URIs as allowed parameter inputs (3b9cf881)
Additional s390x fixes (linclust might work now)
Add support for new MultiParameter type
Bundled SIMDe was updated (thanks @mr-c)

Files

soedinglab/MMseqs2-14-7e284.zip

Files (13.8 MB)

Name	Size	Download all
soedinglab/MMseqs2-14-7e284.zip md5:12ba756605cacb9a3b7bb5fc2a1cb749	13.8 MB	Preview Download

Additional details

Is supplement to: https://github.com/soedinglab/MMseqs2/tree/14-7e284 (URL)

	All versions	This version
Views	3,240	204
Downloads	498	32
Data volume	6.1 GB	440.6 MB

soedinglab/MMseqs2: MMseqs2 Release 14-7e284

Authors/Creators

Description

Files

soedinglab/MMseqs2-14-7e284.zip

Files (13.8 MB)

Additional details

Related works