There is a newer version of the record available.

Published November 6, 2024 | Version 1.3
Dataset Open

gapseq reference sequence databases for Bacteria and Archaea

  • 1. ROR icon Max Planck Institute for Evolutionary Biology
  • 2. Christian-Albrechts-Universität zu Kiel

Description

The repository contains the protein sequences used by gapseq to predict the presence of metabolic reactions and to construct metabolic models.

The workflow using gapseq to generate this set of reference protein sequences:

 

```sh

# delete all "old" data
rm dat/seq/Bacteria/rev/*.fasta
rm dat/seq/Bacteria/unrev/*.fasta
rm dat/seq/Bacteria/rxn/*.fasta
rm dat/seq/Archaea/rev/*.fasta
rm dat/seq/Archaea/unrev/*.fasta
rm dat/seq/Archaea/rxn/*.fasta

# run gapseq find to re-download everything#
# the genome is irrelevant as no blasting is performed ('-x')
gapseq find -p all -t Bacteria -n -x -U toy/ecoli.faa.gz > bac_update.log 2>&1
gapseq find -p all -t Archaea -n -x -U toy/ecoli.faa.gz > ar_update.log 2>&1

# create all sequence .tar.gz archives (rev/unrev/rxn)
cd dat/seq/Bacteria/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Bacteria/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Bacteria/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../

# create md5sum table for all tar.gz archives
cd dat/seq/
find -mindepth 2 -type f -name "*.tar.gz" -exec md5sum {} \; > md5sums.txt

# create taxon-specific final archive for Zenodo upload
tar -czvf Bacteria.tar.gz Bacteria/*/*.tar.gz
tar -czvf Archaea.tar.gz Archaea/*/*.tar.gz

# Upload Bacteria.tar.gz, Archaea.tar.gz, and md5sums.txt  to Zenodo via the web-interface

```

Files

md5sums.txt

Files (632.8 MB)

Name Size Download all
md5:e5deae8253c9d7add238c3525e05ba35
40.3 MB Download
md5:54a4ec95f83b62ac9fcb2b876862e224
592.5 MB Download
md5:219040a17de448b19cbabbb9881ec71f
397 Bytes Preview Download

Additional details

Related works

Is described by
Journal article: 10.1186/s13059-021-02295-1 (DOI)