Published September 14, 2019 | Version v1
Dataset Open

Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development

Authors/Creators

  • 1. Princeton University

Description

This upload contains the expression prediction dataset discussed in the manuscript "Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development" and used by the webserver https://find.princeton.edu.

Abstract:

Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation.  While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation.  Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts.  Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes.  We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion.  We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues.  The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy.  We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling.  This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu,  which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.

Files

all_predictions.zip

Files (67.1 MB)

Name Size Download all
md5:0100d7919dbb05fbf80e3b0e6a47fd30
67.1 MB Preview Download