Published January 4, 2022 | Version v1
Dataset Open

Supplementary files for Reconstructing Kinetic Models for Dynamical Studies of Metabolism using Generative Adversarial Networks, main part

  • 1. EPFL (Ecole Polytechnique Federale de Lausanne), Lausanne, Switzerland

Description

Supplementary files containing datasets needed to reproduce the results of the manuscript "Reconstructing Kinetic Models for Dynamical Studies of Metabolism using Generative Adversarial Networks" by S. Choudhury et al.

The code to use with these data and reproduce the manuscript results is available at  https://github.com/EPFL-LCSB/rekindle and https://gitlab.com/EPFL-LCSB/rekindle. The execution of parts of this code is dependent on the SkimPy toolbox (https://github.com/EPFL-LCSB/skimpy). Refer to the readme files on the REKINDLE code repositories for more details.

Datasets:

  •  models.zip - Datasets parameterizing kinetic nonlinear models of a wild-type E. coli strain used for training generative adversarial networks
    • subfolder 1: kinetic - contains the kinetic model (kin_varma_curated.yml)
    • subfolder 2:  thermo - contains the thermodynamic model for all the four physiologies (varma_fdp1, varma_fdp2, varma_fdp3, varma_fdp4)
    • subfolder 3:  steady_state_samples: contains the TFA steady state profiles for all four physiologies (samples_fdp1, sample_fdp2, samples_fdp3, samples_fdp4)
    • subfolder 4: parameters - contains the kinetic parameter training dataset for each physiology (.hdf5 files), maximal eigenvalues (training labels)  (maximal_eigenvalues.csv) and the minimum eigenvalues (minimal_eigenvalues.csv)
  • vanilla_learning_training.zip: contains 4 folders for each of the 4 physiologies.
    • each of these folders contains 6 subsubfolders in the format N-{n} ( N-10, N-50, N-100, N-500, N-1000, N-72000), where {n} represents the number of used training data samples.
    • every subsubfolder N-{n} contains 5 repeats folders. Each repeat folder contains,
      • E_-1.npy - GAN generated kinetic parameters at E-th epoch/
      • E_-1_max_eig.csv - the maximal eigenvalues of Jacobian for E_-1.npy (Note: eigenvalues were not calculated for N=10, 50, 100 as traning failed)/
  • transfer_learning_training.zip - contains 12 subfolders "tl_fdpi_fdpj" where i,j ={1,2,3,4} for each of the 12 transfer learning case
    • each of these folders contains 5 subsubfolders N-10, N-50, N-100, N-500, N-1000
    • every subsubfolder N-{n} contains 5 repeats folders. Each repeat folder contains,
      • E_-1.npy - GAN generated kinetic parameters at E-th epoch/
      • E_-1_max_eig.csv - the maximal eigenvalues of Jacobian for E_-1.npy 
  • best_generators.zip
    • The best generators (with the highest incidence of relevant models) for each physiology (generator1- 4.h5)
    • The normalizing scaling parameters for each generator (d_scaling.pkl).
    •  
  • Temporal evolution of perturbations in non-linear ordinary differential equations
    • vanilla_ODE_sample_parameters.zip - contains (i) 1000 REKINDLE generated kinetic parameter sets for each of the 4 physiologies and their corresponding eigenvalues (4 in total) (ii) 1000 ORACLE generated kinetic parameter sets for each of the 4 physiologies and their corresponding eigenvalues (4 in total). These parameter sets parameterize the ODEs which are integrated.
    • ode_solutions_physiology1.zip (available at https://zenodo.org/record/5818192) -  contains 100 subfolders, each subfolder containing the time-series evolution data of 1000 kinetic models parameterized by REKINDLE generated parameter sets for physiology 1, each of the 1000 models having a random perturbation.
    • ode_solutions_physiology1_ORACLE.zip (available at https://zenodo.org/record/5819669) -  contains 100 subfolders, each subfolder containing the time-series evolution data of 1000 kinetic models parameterized by ORACLE generated parameter sets for physiology 1, each of the 1000 models having a random perturbation.
    • ode_solutions_physiologies2-4.zip - contains 6 subfolders (physiology_2-4, physiology_2-4_ORACLE), with each subfolder containing 10 sub subfolders. Each sub subfolder containing the time-series evolution data of 1000 kinetic models parameterized by REKINDLE / ORACLE generated parameter sets for physiology 2-4, each of the 1000 models having a random perturbation.
    • transfer_learning_ODE_solutions.zip - contains two subfolders N_10, N_50, each subfolder contains 12 subsubfolders titled i_j (where i = {1,2,3,4} and j = {1,2,3,4} where 1_2 represent the transfer learning case from physiology 2 to physiology 1 and when using {n} samples from physiology 2 and so on (where {n}=10 and 50 respectively).  Each subsubfolders contain
      • i_j.hdf5: contains 300 kinetic parameter sets generated using (i) REKINDLE for this transfer learning case
      • i_j.csv: the maximal eigenvalues of the parameter sets
      • solutions.csv: ODE integrated time series data for the relevant kinetic parameters out of the 300 generated.

 

Files

best_generators.zip

Files (48.9 GB)

Name Size Download all
md5:2f3a921ab5b8c1a601f7ac918932f47f
14.6 MB Preview Download
md5:43e36e8f42e03fe09ac3e126949f3c37
875.6 MB Preview Download
md5:7987d2afdf34c9d99752d1efc3ff09c4
26.8 GB Preview Download
md5:c164bf25e5df01d77e4635bdf79410b2
3.2 GB Preview Download
md5:080e12d8a4f75b99a841ac34f4270f6e
2.6 GB Preview Download
md5:aabc14fe62e493d5f38b7f4f9ec3c187
15.4 GB Preview Download
md5:87d9884bd807cdf37ac5161b9eec1c5c
21.5 MB Preview Download

Additional details

Funding

SHIKIFACTORY100 – Modular cell factories for the production of 100 compounds from the shikimate pathway 814408
European Commission
Computational Methods for modeling and analysis of large-scale metabolic networks 315230_163423
Swiss National Science Foundation
PAcMEN – Predictive and Accelerated Metabolic Engineering Network 722287
European Commission