The cis-regulatory codes of response to combined heat and drought stress in Arabidopsis thaliana
Authors/Creators
- 1. St. Vincents Institute, Melbourne Aus
- 2. University of Michigan
- 3. Michigan State University
Description
Datasets used to train and test random forest and convolutional neural networks to predict transcriptional response patterns to single and combined heat and drought stress in Arabidopsis. Rows correspond to genes. The first column denotes the class with 1 indicating response group and 0 indicating non-responsive group (e.g. "NNU_merged_df.txt": 1 = NNU, 0 = NNN). The remaining columns are the pCRE and pCRE-omic overlap features, where "1" denotes the pCRE is present in the promoter region of that gene (or present and overlaping with the omic-feature) and "0" denotes the pCRE is not present (or present but not overlapping with the omic-feature). Feature names indicate the pCRE and omic-feature: "pCRE_OmicFeature"
For more information on how these datasets were generated and code used to implement and interpret the machine learning models see the manuscript and associated GitHub repository.
GitHub: https://github.com/ShiuLab/Manuscript_Code/tree/master/2019_CRC_HeatDrought
Abstract: Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic, synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility, histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.
Files
DND_merged_df.txt
Files
(56.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ca610c743fcdcd47225bf95a2367b4ff
|
7.3 MB | Preview Download |
|
md5:cbbfb8b0db21071dd2b081742d61ae8b
|
14.7 MB | Preview Download |
|
md5:71952058c255aca088649bdcbb5969db
|
5.7 MB | Preview Download |
|
md5:b442dadc2ecc5506e5922a428b6af086
|
4.6 MB | Preview Download |
|
md5:df1cc10814c0393e21c3854d6501ecad
|
3.5 MB | Preview Download |
|
md5:ac65d0b1dbf2288a52ac3d57d769a9e5
|
8.3 MB | Preview Download |
|
md5:89327040c97ce65ca87ede2bdc0361d8
|
12.8 MB | Preview Download |