Published November 14, 2021 | Version 1.1.2
Dataset Open

Genome-wide gene expression noise in Escherichia coli is condition-dependent and determined by propagation of noise through the regulatory network

  • 1. Biozentrum, University of Basel and Swiss Institute of Bioinformatics

Description

In this repository we provide raw and processed datasets for the article: “Genome-wide gene expression noise in Escherichia coli is condition-dependent and determined by propagation of noise through the regulatory network” by Arantxa Urchueguía, Luca Galbusera, Dany Chauvin, Gwendoline Bellement, Thomas Julou  and Erik van Nimwegen.

A preprint is available under the following DOI: https://doi.org/10.1101/795369

The repository consists of the following datasets: 

1. preprocessed_datasets.zip(~22GB)

  • This dataset contains raw data from the flow cytometry experiments (FACS Canto II, BD Bioscience) in all measured conditions in RData format. Raw fcs files were processed with the tools described in the publication ''Using fluorescence flow cytometry data for single-cell gene expression analysis in bacteria" published here: https://doi.org/10.1371/journal.pone.0240233. The tools themselves are available here: https://github.com/vanNimwegenLab/E-Flow.  Included in the files are the outputs of these processing tools together with all raw values that came directly from the flow cytometer. The file directory_structure_in_preprocessed contains information about how the files are organized.

2. info_files: This is a set of csv files containing detailed information about the experiments done to acquire the preprocessed_datasets as well as annotation files that we used to retrieve promoter information. 

3. processed_datasets: These files correspond to the processed datasets from the raw Rdata files under 1 above.  The processed data provide mean and variance estimates in fluorescence of E.coli promoters across the different growth conditions. Note that we discarded  flow cytometry measurements from promoter/growth-condition combinations that  contained abnormal fluorescence distributions (due to contamination) as well as measurements from reporters with annotation mismatches. The folder contains the following clean dataset files that were used in the paper:

  • FULL_dataset_mean_var_wreplicates: In this dataset we include the processed means and variances (in both logarithmic and linear scale) of all  promoters in each condition. Included as well are replicate measurements for some conditions.. We also include the name and Blattner number of the gene immediately downstream of each promoter, the DNA sequence of each promoter, and regulatory information (number of unique inputs for transcription factors sites and their names) which we obtained from RegulonDB v 10.5 (https://doi.org/10.1093/nar/gky1077). 
  • dataset_with_noise_estimates: In this dataset we provide noise estimates for all promoters expressed above an expression threshold (mean GFP fluorescence at least as large as autofluorescence). Note that the noise estimate correspond to the difference between the promoter’s variance in log-expression and the minimal variance as a function of its mean expression (i.e. the so called noise floor was subtracted). Apart from the mean, variance, noise and promoter features (sequence, name of gene downstream, number of unique regulatory inputs and name of the TFs binding), we also include the parameters used for fitting the minimal noise, i.e. noise floor,  in each of the conditions. 
  • time_course_data_SI: This dataset contains mean and variance measurements of one of the plates of the library measured at different time points during growth in Minimal media 0.4M NaCl: 0h (just after dilution), 1h, 2h, 3h, 5h, 6.5h, 8.5h, 10h and 11h. 
  • growth_curves_SI: Growth data (OD600 as a function of time) for a subset of the promoters from the library across different growth conditions.
  • singlecell_areas_SI: Single-cell areas estimated using agar patches of cells growing in each condition. Each row of the table contains data for a single-cell. 
  • synthetic_promoters_dataset: This dataset contains mean, variance and noise measurements of a set of constitutive promoters from  https://doi.org/10.7554/eLife.05856.001 across different conditions.
  • MARA_results:  All transcription factor activities results explaining measured noise levels in each condition. This data has been obtained after performing Motif Activity Response Analysis on the noise levels of all measured promoters in each condition.

Notes

We would also like to acknowledge funding from the Werner Siemens Stiftung and the SystemsX.ch StoNets grant.

Files

info_files_v2.zip

Files (22.7 GB)

Name Size Download all
md5:c2c994d4f20ce3bfc05ad690966e7136
1.4 kB Download
md5:3d0fa289d0f1bd741ef10be3708239ce
296.9 kB Preview Download
md5:d416f9139e485abab0b9a258ae1ebfcd
677.1 MB Preview Download
md5:ac5434223b69b2cc5220f9796134d1ce
22.0 GB Preview Download
md5:e1e4f604526e1aff01ab3ddb9c119dcb
3.2 MB Preview Download

Additional details

Funding

Swiss National Science Foundation
The role of gene expression noise in the evolution of gene regulation 31003A_159673