Published May 4, 2026 | Version v1
Dataset Open

SialoCat: a curated catalog of bacterial proteins involved in sialic acid metabolism

Description

final_fastas.tar.gz

  • initial_fastas – Initially downloaded sequences, with redundancy removed both within and across the source databases used to compile the dataset.

  • FASTAS_aligned – Subset of sequences that aligned to at least one reference sequence.

  • FASTAS_with_essential_signature – Sequences containing the required essential signature.

  • FASTAS_with_essential_aligned – Sequences containing the essential signature that also passed the alignment filter.

  • FASTAS_without_extra – Sequences containing the essential signature and no additional non-reference signatures.

  • FASTAS_without_extra_aligned – Sequences containing the essential signature, no additional non-reference signatures, and that also satisfied the alignment criteria.

all_code_dataframes.tar.gz

Contains the mapping between original FASTA headers and the internal IDs used in the FASTA files provided in this repository.

Files

Files (1.5 GB)

Name Size Download all
md5:c65fe60a24a644660f17757115b5f54c
99.2 MB Download
md5:49264ab961cd8255c96e574278f1508e
1.4 GB Download