Published February 21, 2024 | Version 1.0
Dataset Open

sORFdb

  • 1. Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, 35392, Germany

Description

This repository contains the data of the sORFdb database.

 

sORFdb is a dedicated taxon-independent database for short open reading frames (sORFs) and small proteins as well as their functions in bacteria.

It combines high quality sORF and protein sequences and enrichines them with additional information such as physicochemical properties and the presence ribosomal binding sites. Small protein families were identified and can be used for sequence search and further studies.

 

The complete database with all small proteins is available as TSV file and the sequence data (sORF and small proteins) of all entries in the database are avilable as compressed FASTA files:

  • sorfdb.tsv.gz: sORFdb database
  • sorfdb.faa.gz: Small proteins
  • sorfdb.fna.gz: sORFs

For the small protein families a TSV with small protein sequences, family IDs as well as family statistics and a compressed ASCII HMM file (HMMER3/f 3.4) is provided:

  • sorfdb.families.tsv.gz: Small protein families
  • sorfdb.hmm.gz: ASCII HMM file

 

Files

Files (10.3 GB)

Name Size Download all
md5:03058d38128043b73897ba2c2ab8750d
1.5 GB Download
md5:bac586c8a90a4d246581ea6b67364e36
3.8 MB Download
md5:3f5dc9554662262996d05ea09edfb104
2.1 GB Download
md5:6574aa9b958c50c8220b6ae89e993322
31.2 MB Download
md5:9a276616c3ce9726b5f3cc3246f14439
6.7 GB Download