sORFdb
- 1. Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, 35392, Germany
Description
This repository contains the data of the sORFdb database.
sORFdb is a dedicated taxon-independent database for short open reading frames (sORFs) and small proteins as well as their functions in bacteria.
It combines high quality sORF and protein sequences and enrichines them with additional information such as physicochemical properties and the presence ribosomal binding sites. Small protein families were identified and can be used for sequence search and further studies.
The complete database with all small proteins is available as TSV file and the sequence data (sORF and small proteins) of all entries in the database are avilable as compressed FASTA files:
- sorfdb.tsv.gz: sORFdb database
- sorfdb.faa.gz: Small proteins
- sorfdb.fna.gz: sORFs
For the small protein families a TSV with small protein sequences, family IDs as well as family statistics and a compressed ASCII HMM file (HMMER3/f 3.4) is provided:
- sorfdb.families.tsv.gz: Small protein families
- sorfdb.hmm.gz: ASCII HMM file