Published May 6, 2022
| Version 2.0
Dataset
Open
extHomFam 2: large-scale benchmark for protein multiple sequence alignments
Description
extHomFam 2 was constructed by combining Homstrad reference alignments (March 2020) with Pfam 33.1 complete families (NCBI variant). Homstrad entries with less than 3 reference sequences and those pointing to dead Pfam families were discarded. The resulting benchmark was divided into subsets depending on the family size N:
| subset | N range | # families |
| small | [200, 10 000) | 86 |
| medium | [10 000, 40 000) | 95 |
| large | [40 000, 100 000) | 83 |
| xlarge | [100 000, 250 000) | 67 |
| huge | [250 000, 3 000 000) | 62 |
The directories in the archive correspond to the names of the subsets, while the reference alignments are located in 'ref' folder.
Files
extHomFam-v2.zip
Files
(3.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:40caec9385955447b110e0c1ccb3fa9d
|
3.9 GB | Preview Download |
Additional details
Related works
- Continues
- Journal article: 10.1038/srep33964 (DOI)
- Is new version of
- Dataset: 10.7910/DVN/BO2SVW (DOI)
References
- Deorowicz, S., Debudaj-Grabysz, A. & Gudyś, A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Sci Rep 6, 33964 (2016). https://doi.org/10.1038/srep33964