Published March 28, 2025
| Version v1
Dataset
Open
Supporting data for the paper "CREMSA: Compressed Indexing of (Ultra) Large Multiple Sequence Alignments"
Creators
Description
Four files used in the paper “CREMSA: Compressed Indexing of (Ultra) Large Multiple Sequence Alignments” are made available here for reproducibility:
random_datasets_len10000_num30000.zip
: An archive of artificial FASTA files generated as described in the paper.HIV1_ALL_2022_genome_DNA.fasta.xz
: A multiple sequence alignment of 5,381 HIV1 genomes, retrieved from the Los Alamos National Laboratory on March 2025.nextstrain_groups_LANL-HIV-DB_HIV_genome_timetree.jsonl.gz
: A JSONL file, as produced by Nextstrain, of the phylogeny of 3,090 HIV genomes among the 5,381 from the previous file.MFS_1.fasta.xz
: A multiple sequence alignment of 214,283 protein sequences of the Major Facilitator Superfamily (MFS), retrieved from Pfam on March 2025.