Published December 16, 2019 | Version 1.0
Dataset Open

Enzymes from the BRENDA and CAZy databases annotated with organism growth temperatures and predicted Topt

  • 1. Chalmers University of Technology

Description

This repo is an updated version of repo Gang Li, & Martin KM Engqvist. (2019). Enzymes from the BRENDA database annotated with organism growth temperatures and predicted Topt (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.2539114. 

Experimental as well as predicted organism growth temperatures were used to annotate enzymes from the BRENDA database (doi: 10.1093/nar/gky1048, https://www.brenda-enzymes.org) version 2018.2 (July 2018) and CAZy database (http://www.cazy.org/).  

An updated machine learning model was applied to predict the optimal functional temperature of enzymes from BRENDA and CAZy. 

There are four files in this repo:

1. 'annotated_brenda.tsv' is a tab-seperated file that contains the annotated enzymes from BRENDA. There are 9 columns in the file: index column; "ec", EC number; "uniprot_id", protein id in Uniprot database; "domain", the domain of life (superkingdom), either Archaea, Bacteria, or Eukarya; "organism", species name; "ogt", optimal growth temperature of the organism; "ogt_note", whether the experimental or predicted ogt is used; "topt", the optimal functional temperature of the enzyme; "topt_note", whether the experimental or predicted topt is used.

2. 'annotated_cazy.tsv' is a tab-seperated file that contains the annotated enzymes from CAZy. There are 12 columns in the file: index column; "family", CAZy family id; "genbank", genbank id; "Protein Name", the protein name from CAZy database; "ec", EC number; "organism", strain name; "uniprot_id", protein id in Uniprot database; "PDB/3D", structure id in PDB database; "ogt", optimal growth temperature of the organism; "ogt_note", whether the experimental or predicted ogt is used; "topt", the optimal functional temperature of the enzyme; "topt_note", whether the experimental or predicted topt is used.

3. 'brenda.sql', which is a SQLite3 database version of 'annotated_brenda.tsv', with an additional column of enzyme sequences.

4. 'cazy.sql', which is a SQLite3 database version of 'annotated_cazy.tsv'', with an additional column of enzyme sequences.

The SQLite3 databases are for the Tome tool (https://github.com/EngqvistLab/Tome), version 2.0.

Files

Files (4.8 GB)

Name Size Download all
md5:71db7d042e9e8dddbd3a42a39a6a7234
623.9 MB Download
md5:5c0675fb86bbfcbcefb0619c678aa884
107.1 MB Download
md5:356812b7362d00e4f9f264f65fbd6376
3.5 GB Download
md5:715542dd53c733a76b241d8313aaa57f
601.5 MB Download

Additional details

Funding

PAcMEN – Predictive and Accelerated Metabolic Engineering Network 722287
European Commission