gapseq reference sequence databases for Bacteria and Archaea
Authors/Creators
Description
The repository contains the protein sequences used by gapseq to predict the presence of metabolic reactions and to construct metabolic models.
The workflow using gapseq to generate this set of reference protein sequences:
```sh
# delete all "old" data
rm dat/seq/Bacteria/rev/*.fasta
rm dat/seq/Bacteria/unrev/*.fasta
rm dat/seq/Bacteria/rxn/*.fasta
rm dat/seq/Archaea/rev/*.fasta
rm dat/seq/Archaea/unrev/*.fasta
rm dat/seq/Archaea/rxn/*.fasta
# run gapseq find to re-download everything#
# the genome is irrelevant as no blasting is performed ('-x')
gapseq find -p all -t Bacteria -n -x -U toy/ecoli.faa.gz > bac_update.log 2>&1
gapseq find -p all -t Archaea -n -x -U toy/ecoli.faa.gz > ar_update.log 2>&1
# create all sequence .tar.gz archives (rev/unrev/rxn)
cd dat/seq/Bacteria/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Bacteria/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Bacteria/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
cd dat/seq/Archaea/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
# create md5sum table for all tar.gz archives
cd dat/seq/
find -mindepth 2 -type f -name "*.tar.gz" -exec md5sum {} \; > md5sums.txt
# create taxon-specific final archive for Zenodo upload
tar -czvf Bacteria.tar.gz Bacteria/*/*.tar.gz
tar -czvf Archaea.tar.gz Archaea/*/*.tar.gz
# Upload Bacteria.tar.gz, Archaea.tar.gz, and md5sums.txt to Zenodo via the web-interface
```
Files
md5sums.txt
Files
(632.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e5deae8253c9d7add238c3525e05ba35
|
40.3 MB | Download |
|
md5:54a4ec95f83b62ac9fcb2b876862e224
|
592.5 MB | Download |
|
md5:219040a17de448b19cbabbb9881ec71f
|
397 Bytes | Preview Download |
Additional details
Related works
- Is described by
- Journal article: 10.1186/s13059-021-02295-1 (DOI)