There is a newer version of the record available.

Published June 6, 2022 | Version 1.0
Dataset Open

AuCoMe: inferring and comparing metabolisms across heterogeneous sets of annotated genomes

  • 1. Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
  • 2. Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff(SBR), 29680 Roscoff, France
  • 3. Inria, INRAE, Université de Bordeaux, France

Description

CONTENT OF THIS ARCHIVE
The Zenodo archive is composed of one file and two main directories:

 * analyses: This directory contains all tabulated files used to create the figures and results of the paper.

 * datasets: This directory gathers all datasets on which AuCoMe was run: the bacterial, fungal, and algal datasets,and the 32 synthetic datasets, which contain an E. coli K–12 MG1655 genome to which various degradations were  applied, together with 28 other bacterial genomes.
   
 * metacyc 23.5.padmet: This is the version 23.5 of the MetaCyc database (https://metacyc.org/) in the PADMET format. It was used by AuCoMe to reconstruct all the metabolic networks. Hence metacyc 23.5.padmet is required to reproduce our work.


1/ Content of the analyses subdirectory
 * figure_2_bacterial_nb_reactions.tsv: For each species of the bacterial dataset, this file gives the number of reactions at each AuCoMe step. It was used to plot Fig 2B of this paper.
   
 * figure_2_fungal_nb_reactions.tsv: For each species of the fungal dataset, this file gives the number of reactions at each AuCoMe step. It was used to plot Fig 2C of this article.

 * figure_2_algal_nb_reactions.tsv: For each species of the algal dataset, this file gives the number of reactions at each AuCoMe step. It was used to plot Fig. 2D of this paper.
   
 * figure_3_nb_reactions_step.tsv: For each run on 32 synthetic bacterial datasets, these are the number of reactions at each AuCoMe step. It was used to plot Fig 3A of this article.

 * figure_3_fmeasure_steps.tsv: For each run on 32 synthetic bacterial datasets, these are values of F-measures after comparison of the GSMNs recovered for each E. coli K–12 MG1655 genome replicate with the gold-standard network. It was used to plot Fig 3B of this paper.

 * figure_S1_Deepec_fungal.tsv: For each species of the fungal dataset, at each AuCoMe step: robust orthology, non-robust orthology, and annotation or orthology, several measures were computed, i.e.: the number of reactions, the number of ECs, the number of ECs valided by DeepEC, and ratio number of ECs valided by DeepEC / number of ECs. It was used to design Fig. S3(a) of this article.

 * figure_S1_Deepec_algal.tsv: For each species of the algal dataset, at each AuCoMe step: robust orthology, non-robust orthology, and annotation or orthology, several measures were computed, i.e.: the number of reactions, the number of ECs, the number of ECs valided by DeepEC, and the ratio number of ECs valided by DeepEC / number of ECs. It was used to design Fig. S3(a) of this paper.
   
 * SuplFile_o-Aminophenol_reactions.ods: This file comprises three tables: S9, S10, and S11 with more detail (like the amino acid sequences in the S11).


2/ Content of the datasets subdirectory

2.1/ Content of the algal, bacterial, and fungal directories
These three directories are composed of 8 subdirectories:

 * FASTA: It contains the proteome of each species as a FASTA file.

 * cleaned_GBKs: For each species, it contains the annotated genome, with the protein sequences as a GenBank file.

 * dictionaries: For some species, genes needed to be renamed for compatibility reasons. In this case a CVS file with the old names of genes and the new ones is provided.

 * annotated_DATs: It contains a subdirectory per species with all the output files from Pathway Tools v23.5, without any post-treatment, in the DAT format.

 * annotated_PADMETs: For each species, it contains a metabolic network of the draft reconstruction step of AuCoMe, in the PADMET format.
   
 * final_SBMLs: For each species, it contains a metabolic network generated by the AuCoMe workflow, in the SBML format.
   
 * final_PADMETs: For each species, it contains a metabolic network generated by the AuCoMe workflow, at the PADMET format.
   
 * panmetabolism: It is composed of 7 files describing the final metabolic networks:
   
     – genes.tsv: This table contains, for each organism, the list of genes and the associated reactions.
       
     – metabolites.tsv: This table contains the list of metabolites present in the panmetabolism. Then, for each metabolite and for each organism, it lists the reactions that produced this compound and the reactions that consumed it.

     – pathways.tsv: This table contains the list of pathways present in the panmetabolism. For each pathway and for each organism, it indicates the number of reactions present in this pathway, and the names of these reactions.

     – reactions.tsv This table contains the list of reactions present in the panmetabolism. Then for each reaction, it indicates whether or not it belongs to an organism. If a reaction is found in a species, the genes associated with the reaction are also listed.

     – pvclust_reaction_dendrogram.png: Based on the presence/absence matrix of reactions in different species of the dataset, it computes the Jaccard distances between these species, and it applies a hierarchical clustering on these data with a complete linkage to create a dendrogram. The R package pvclust is used to create the dendrogram, then we added multiscale bootstrap resampling. For each node, a p-value indicates how strong the cluster is supported by data. This dendrogram is provided a PNG picture.


2.2/ Content of the synthetic bacterial repertory
The synthetic bacterial repertory contains 32 subdirectories named Run 00, Run 01, ... , etc, Run 31. Each subdirectory is composed of 9 files:

 * K_12_MG1655.gbk: The annotated genome of E. coli K–12 MG1655 to which degradation of the functional and/or structural annotations was applied.

 * annotated_K_12_MG1655.sbml: The metabolic network of E. coli K–12 MG1655 output of the draft reconstruction step of AuCoMe in the SBML format.

 * annotated_K_12_MG1655.padmet: The metabolic network of E. coli K–12 MG1655 output of the draft reconstruction step of AuCoMe in the PADMET format.
   
 * orthology_K_12_MG1655.sbml: The metabolic network of E. coli K–12 MG1655 output of the orthology propagation step of AuCoMe in the SBML format.
   
 * orthology_K_12_MG1655.padmet: The metabolic network of E. coli K–12 MG1655 output of the orthology propagation step of AuCoMe in the PADMET format.
   
 * structural_K_12_MG1655.sbml: The metabolic network of E. coli K–12 MG1655 output of the structural verification step of AuCoMe in the SBML format.

 * structural_K_12_MG1655.padmet: The metabolic network of E. coli K–12 MG1655 output of the structural verification step of AuCoMe in the PADMET format.
   
 * final_K_12_MG1655.sbml: The metabolic network of E. coli K–12 MG1655 output of the AuCoMe workflow in the SBML format.
   
 * final_K_12_MG1655.padmet: The metabolic network of E. coli K–12 MG1655 output of the AuCoMe worflow in the PADMET format.

Files

AuCoMe_Supplementary_data.zip

Files (5.5 GB)

Name Size Download all
md5:5e5e8112f8e8dab87727f8e7cf281fa9
5.5 GB Preview Download