Domain problem formulation

Enhancers regulate on/off patterns of gene expression in developing embryos to control tissue differentiation and organ formation. Experiments on individual enhancer elements in Drosophila, in particular eve stripe 2, have demonstrated that high-order interactions among DNA binding proteins known as transcription factors (TFs) play an important role in regulating activity (Small, Blair, and Levine 1992; Andrioli et al. 2002; Ilsley et al. 2013; Levine 2013). Here, we seek to identify TF interactions that are associated with enhancer activity in the early Drosophila embryo from high-throughput genomic data. In particular, previous analyses have systematically characterized the activity of genomic regions in Drosophila to determine whether they act as enhancers in particular developmental contexts (Evgeny Z Kvon et al. 2014). We build on our previous work studying TF interactions in this high-throughput data (Basu et al. 2018) to develop a principled approach for conducting inference on high-order interactions that are associated with enhancer activity.

Data collection

The observations in our anaylsis represent 7809 segments of the Drosophila genome. For each of these segments, we generated a set of features and response labels based on genomic data measured in two seperate experiments.

Experiment 1: response labels

Each of the segments considered in our anlysis have been experimentally evaluated for enhancer activity (E. Z. Kvon et al. 2014; Hammonds et al. 2013; Frise, Hammonds, and Celniker 2010). In these experiments, each segment is placed in a reporter construct and integrated into a specific site in the genome. The transgenic fly line is then evaluated to determine whether or not the segment is sufficient to drive gene expression patterns at a particular time and in a particular region of the embryo. Segments that result in patterned gene expression in the transgenic flies are labeled as active enhancers.

Experiment 2: features

The features for our analysis come from ChIP-chip and ChIP-seq assays that measure DNA binding affinities for 23 TFs in stage 5 blastoderm embryos (Li et al. 2008, 2014; MacArthur et al. 2009). For a single assay, an antibody specific to the TF of interest is used to filter sheared segments of DNA from across the entire embryo. The filtered segments of DNA are measured using a microarray and subsequently mapped back to a reference genome. This procedure produces genome-wide maps of DNA binding for each of the targeted TFs. Based on this data, we evaluated the binding activity for each TF in regions that had also been evaluated for enhancer activity. It is important to note that the maps represent DNA binding across the entire embryo. As a result, two TFs that bind the same location in the genome are not necessarily binding in the same cells.

Data storage

Raw data are available through the Berkeley Drosophila Transcription Network Project (BDTNP). Preprocessed data used for this analysis are available on Zenodo.

Data cleaning and preprocessing

For our prediction problem, segments that drive patterned expression in stage 5 blastoderm embryos were labeled as active elements. This develomental stage corresponds to the time point that ChIP-chip data were measured. To form the set of features for our analysis, we used ChIP peaks from BDTNP and (Li et al. 2014), which are available at 1% and 25% FDR thresholds. We focused our analysis on the 25% FDR peaks, allowing supervised learning models to determine which peaks were relevant for predicting enhancer activity. This approach limits the potential for screening out important signal, and allows us to evaluate the quality of selected peaks directly based on predictive accuracy. Since the percent of enhancer region contained in each segment was unknown, the average ChIP peak is a potentially biased estimator. We therefore computed the maximum peak value of each ChIP assay for each segment in the dataset.

In cases where replicate ChIP peaks are available for a given TF, we take the maximum peak value across replicates. This approach treats a TF as active in a particular region if any replicates identify it as bound in the region. We note that taking the average peak value across replicates yields qualitatively similar results, which can be seen be adjusting the aggregating function below (see alternative documentation).

# Aggregate replicated features by taking the maximum across all replicates
colnames(x) <- varnames
varnames <- unique(varnames)
aggregate <- function(g, x, FUN=max) {
  xx <- as.matrix(x[,colnames(x) == g])
  out <- apply(xx, MAR=1, FUN)
  return(out)
}

x <- sapply(varnames, aggregate, x=x)

Prediction formulation

We searched for candidate biological interactions using iteratively re-weighted Random Forests (RF), whose decision rules match the combinatorial, thresholding behavior of stereospecific biological interactions (Nelson, Lehninger, and Cox 2008). Given an iteratively re-weighted RF, we search for features that frequently co-occur along decision paths, and thus may be good candidates for predictive interacitons. To generate predictions for a single interaction, we use decision rules containing the given interaction as detailed in (Kumbier et al. 2018). In summary, we collect the decision paths that contain a given interaction from across the RF and use the portion of each decision rule corresponding to the interacting features to make predicitons. This approach generates predictions that depend only on interacting features, allowing us to evaluate how a set of features is collectively associated with responses independently of any other feature. In the context of our biological question, these predictions evaluate whether binding by a collection of TFs is sufficient to differentiate between active and inactive enhancers without knowing the behavior of other TFs.

The 3D architecture of the genome is known to play an important regulatory role by organizing DNA into interacting regions associated with gene expression. However, limited knowledge of the 3D structure of the genome does not suggest a well justified approach for sample splitting. Moreover, we were interested in identifying candidate interactions across the entire genome. We therefore generated a held-out test set by randomly sampling \(50\%\) of the segments in our dataset. This approach, as opposed to a block sampling approach (e.g. splitting chromosomes) ensured that the interactions we recovered were not biased towards a particular region (e.g. chromosome) relative to the available data. We note however, that the enhancers tested by (Evgeny Z Kvon et al. 2014) and used in this analysis were experimentally selected and do not represent an unbiased sample of the enitire genome.

The predictions from iRF represent the proportion of decision trees that predict a given observation to be an active enhancer. We evaluated these predictions based on the area under the precision recall curve (AUC-PR). This metric evaluates precision and recall across all possible thresholds of iRF predictions to quantify prediction accuracy without relying on a user specified threshold. AUC-PR provides a more holistic representation of prediction accuracy in our setting, where there is no well-established threshold. Moreover, AUC-PR does not account for true negatives. This makes the metric well-suited metric well suited to our problem where classes are imbalanced (\(\sim 10 \%\)) active enhancers.

test.prop <- 0.5 # proportion of observations to sample as held-out test set

test.id <- sample(1:nrow(x), floor(nrow(x) * test.prop))
train.id <- setdiff(1:nrow(x), test.id)

ytest <- y[test.id]
xtest <- x[test.id,]

Stability formulation

Bootstrap sampling is a widely accepted perturbation scheme for problems in genomics that is a useful baseline for data where we have limited understanding of the dependencies. However, sequences located in similar regions of genome space (i.e. nearby on the DNA) exhibit dependent behavior that may influence results. In particular, enhancers that perform redundant tasks, known as “shadow enhancers,” are believed to confer robustness to regulatory processes (Hong, Hendrix, and Levine 2008). (Cannavò et al. 2016) studied shadow enhancers in detail and found that over \(70\%\) of loci they examined have anywhere from 2-5 shadow enhancers with highly overlapping patterns of activity. To account for this potential dependency along the genome, we also considered a block bootstrap perturbation strategy. We sampled blocks of segments from our dataset that appear sequentially along the genome, and considered blocks of size 5 (2 upstream, 2 downstream segments) and 11 (5 upstream and 5 downstream segments) to match the range in number of shadow enhancers reported in (Cannavò et al. 2016). We evaluated the prevalence of each interaction across \(B=100\) RFs trained on an outer layer of bootstrap samples using the 3 proposed perturbation schemes.

# Create an ordered list of the candidate enhancer segments based on their
# positions along each chromosome
gene.coords <- group_by(gene.coords, chr) %>%
    arrange(desc(r6start)) %>%
    ungroup()

# Generate blocks of 5 and 11 segments arranged consecutively along the genome
block5.tr <- makeBlocks(gene.coords, idcs=train.id, size=5)
block11.tr <- makeBlocks(gene.coords, idcs=train.id, size=11)
block5.tst <- makeBlocks(gene.coords, idcs=test.id, size=5)
block11.tst <- makeBlocks(gene.coords, idcs=test.id, size=11)

Hypothesis testing formulation

To test whether certain interactions are enriched among active enhancer elements, we use a sample splitting approach that treats class-\(0\) observations (inactive enhancer elements) as a control. For each interaction \(S\) recovered by iRF, we compare its prevalence along RF decision paths in class-\(1\) and class-\(0\) predicted leaf nodes, denoted \(P(S|1)\) and \(P(S|0)\) respectively (Kumbier et al. 2018). Intuitively, this uses a data perturbation (evaluating prevalence separately for each class of observation) to test a hypothesis that a given interaction occurs more frequently when an enhancer is active. For each bootstrap sample, we evaluate \(P(S|1) > P(S|0)\) to determine the proportion perturbations for which an interaction is enriched among active enhancers. More formally, we test the hypothesis: \[\begin{equation} H_0^{(enrich)}: P(S|1) = P(S|0) \end{equation}\]

In addition, we test a hypothesis that aims to evaluate the strength of an interaction based on the probability of feature selection across an RF. Intuitively, for an interaction \(S\), we would like to determine whether selecting some of the features in \(S\) increases the probability of selecting the remaining features. Formally, for an interaction \(S\), we evaluate whether it is selected more frequently than by chance based on the probability of selecting \(S', S''\) that partition \(S\). That is, we test the hypothesis: \[\begin{equation} H_0 ^ {(interact)}: P(S|1) = \max_{(S', S'') \in \mathcal{S}} P(S'|1)P(S''|1) \end{equation}\] where \(\mathcal{S}\) represents pairs of interactions that partition \(S\). Since testing all lower order subsets becomes computationally infeasible for large interactions, we restrict to the case where \(|S''| = 1\). Intuitively, this tests whether the final featue is selected independently of the others.

Model fitting & Prediction analysis

Below, we fit an iterative Random Forest (iRF) using standard bootstrap sampling using the number of iterations reported in (Basu et al. 2018). We evaluate predictive accuracy for the full random forest on randomly sampled held out test data. We achieve an AUC-PR of \(~0.48\), comparable to previously reorpted results for this data (Basu et al. 2018; Arbel et al. 2019). In addition, we evaluate predictive accuracy for each interaction using the held-out test set and evaluate the stability of these predictions using the bootstrap samples described above.

n.iter <- 4 # number of iterations taken from Basu et al. (2018)
n.bootstrap <- 100 # number of outer-layer bootstrap samples for iRF

# Fit an iRF to predict enhancer activity based on TF binding activity
fit.bs <- iRF(x=x[train.id,], y=as.factor(y[train.id]),
              n.iter=n.iter, 
              n.core=n.core, 
              n.bootstrap=n.bootstrap,
              varnames.grp=varnames,
              verbose=FALSE,
              interactions.return=n.iter)
# Evaluate iRF predictions on the held-out test set
ypred <- predict(fit.bs$rf.list, newdata=x[test.id,], type='prob')[,2]
auc.full <- aucPR(ypred=ypred, ytest=ytest)
print(auc.full)
## [1] 0.4837065
# Extract decision path information from iRF to generate predictions for
# individual interactions
read.f <- readForest(fit.bs$rf.list, x=x[train.id,], varnames.grp=varnames,
                     get.split=TRUE, n.core=n.core)

# Generate predictions for individual interactions recovered by iRF
ints <- fit.bs$importance$int
predictions <- mclapply(ints, interactPredict, read.forest=read.f,
                        x=x[test.id,], varnames.grp=varnames,
                        mc.cores=n.core)

# Evaluate the prediction accuracy (AUC-PR) of individual interactions 
# recovered by iRF
aucs <- sapply(predictions, aucPR, ytest=ytest)
names(aucs) <- ints

# Evaluate AUC-PR for each interaction across standard bootstrap samples of the
# held-out test set
ntest <- length(test.id)
pred.bootstrap <- 100 # number of bootstrap replicates for prediction
bsid <- replicate(pred.bootstrap, bssample(ntest), simplify=FALSE)
bsaucs <- mclapply(bsid, bootstrapAUC, y=y, predictions=predictions,
                   test.id=test.id, mc.cores=n.core)
bsaucs <- do.call(rbind, bsaucs)
colnames(bsaucs) <- ints

# Evaluate AUC-PR for each interaction across block bootstrap samples of the
# held-out test set, with blocks of size 5
bl5.id <- replicate(pred.bootstrap, blockbs(block5.tst), simplify=FALSE)
bl5aucs <- mclapply(bl5.id, bootstrapAUC, y=y, predictions=predictions, 
                    test.id=test.id, mc.cores=n.core)
bl5aucs <- do.call(rbind, bl5aucs)
colnames(bl5aucs) <- ints

# Evaluate AUC-PR for each interaction across block bootstrap samples of the
# held-out test set, with blocks of size 11
bl11.id <- replicate(pred.bootstrap, blockbs(block11.tst), simplify=FALSE)
bl11aucs <- mclapply(bl11.id, bootstrapAUC, y=y, predictions=predictions,
                     test.id=test.id, mc.cores=n.core)
bl11aucs <- do.call(rbind, bl11aucs)
colnames(bl11aucs) <- ints

Stability analysis

Below, we fit an iterative Random Forest (iRF) using the block bootstrap sampling described above and the number of iterations reported in (Basu et al. 2018). Comparing the interactions recovered across each bootstrap samling approach allows us to identify interactions that are stable with respect to data perturbation. We filter these interactions based on their predictive accuracy, taking the those whose associated predictions achieve an AUC-PR of at least 90% of the full iRF AUC-PR. In addition, we filter interactions that satisfy the hypothesis criteria described above in at least \(50\%\) of bootstrap samples for all bootstrap samling approaches.

# Filter interactions based on predictive accuracy: remove any interaction
# whose accuracy is not at least 90% of the full random forest AUC-PR
stability <- stability %>% filter(auc / auc.full >= 0.9)

# Filter interactions based on hypothesis testing formulation: remove any
# interaction that is not enriched (H_0 ^ (enrich)) and selected in a dependent
# manner (H_0 ^ (interact)) for at least 50% of bootstrap replicates for all
# types of bootstrap sampling 
stability <- stability %>% filter(sta.diff.x >= 0.5, sta.prev.x >= 0.5, 
                                  sta.diff.y >= 0.5, sta.prev.y >= 0.5, 
                                  sta.diff >= 0.5, sta.prev >= 0.5)

Results

Figs. A and B report the distribution of predictive accuracy (AUC-PR) and prevalence, which represents the stability of selection across an RF, respectively for each of the bootstrap perturbations after filtering. We note that all of the interactions reported below are enriched relative to non-enhancer elements (i.e. P(S|1) > P(S|0)) across all bootstrap replicates. Broadly speaking, the predictive accuracy of interactions is highly similar across different bootstrap perturbations. However, the predictive accuracy of a given interaction can vary considerably across bootstrap samples of a single perturbation. We hypothesize that this is due to the highly hetergeneous activity of enhancer elements. That is, enhancers can play a wide variety of roles in different developmental contexts. The simple binary response used to characterize each element does not account for this variability in behavior.

Interactions that are both predictive and highly stable are all among the regulatory factors \(Gt\), \(Kr\), \(Zld\), and \(Twi\). All of the pairwise interactions among these factors have been previously reported, for references see (Basu et al. 2018). The third order interactions suggest novel hypotheses for follow up wet lab studies.

## Using int as id variables

References

Andrioli, Luiz Paulo Moura, Vikram Vasisht, Ekaterina Theodosopoulou, Adam Oberstein, and Stephen Small. 2002. “Anterior Repression of a Drosophila Stripe Enhancer Requires Three Position-Specific Mechanisms.” Development 129 (21): 4931–40.

Arbel, Hamutal, Sumanta Basu, William W Fisher, Ann S Hammonds, Kenneth H Wan, Soo Park, Richard Weiszmann, et al. 2019. “Exploiting Regulatory Heterogeneity to Systematically Identify Enhancers with High Accuracy.” Proceedings of the National Academy of Sciences 116 (3): 900–908.

Basu, Sumanta, Karl Kumbier, James B Brown, and Bin Yu. 2018. “Iterative Random Forests to Discover Predictive and Stable High-Order Interactions.” Proceedings of the National Academy of Sciences, 201711236.

Cannavò, Enrico, Pierre Khoueiry, David A Garfield, Paul Geeleher, Thomas Zichner, E Hilary Gustafson, Lucia Ciglar, Jan O Korbel, and Eileen EM Furlong. 2016. “Shadow Enhancers Are Pervasive Features of Developmental Regulatory Networks.” Current Biology 26 (1): 38–51.

Frise, E., A. S. Hammonds, and S. E. Celniker. 2010. “Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape.” Mol. Syst. Biol. 6: 345.

Hammonds, A. S., C. A. Bristow, W. W. Fisher, R. Weiszmann, S. Wu, V. Hartenstein, M. Kellis, B. Yu, E. Frise, and S. E. Celniker. 2013. “Spatial expression of transcription factors in Drosophila embryonic organ development.” Genome Biol. 14 (12): R140.

Hong, Joung-Woo, David A Hendrix, and Michael S Levine. 2008. “Shadow Enhancers as a Source of Evolutionary Novelty.” Science (New York, NY) 321 (5894): 1314.

Ilsley, Garth R, Jasmin Fisher, Rolf Apweiler, Angela H DePace, and Nicholas M Luscombe. 2013. “Cellular Resolution Models for Even Skipped Regulation in the Entire Drosophila Embryo.” Elife 2: e00522.

Kumbier, Karl, Sumanta Basu, James B Brown, Susan Celniker, and Bin Yu. 2018. “Refining Interaction Search Through Signed Iterative Random Forests.” arXiv Preprint arXiv:1810.07287.

Kvon, Evgeny Z, Tomas Kazmar, Gerald Stampfel, J Omar Yáñez-Cuna, Michaela Pagani, Katharina Schernhuber, Barry J Dickson, and Alexander Stark. 2014. “Genome-Scale Functional Characterization of Drosophila Developmental Enhancers in Vivo.” Nature 512 (7512): 91.

Kvon, E. Z., T. Kazmar, G. Stampfel, J. O. Yanez-Cuna, M. Pagani, K. Schernhuber, B. J. Dickson, and A. Stark. 2014. “Genome-scale functional characterization of Drosophila developmental enhancers in vivo.” Nature 512 (7512): 91–95.

Levine, Michael. 2013. “Computing Away the Magic?” eLife 2: e01135.

Li, Xiao-Yong, Melissa M Harrison, Jacqueline E Villalta, Tommy Kaplan, and Michael B Eisen. 2014. “Establishment of Regions of Genomic Activity During the Drosophila Maternal to Zygotic Transition.” Elife 3: e03737.

Li, Xiao-yong, Stewart MacArthur, Richard Bourgon, David Nix, Daniel A Pollard, Venky N Iyer, Aaron Hechmer, et al. 2008. “Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm.” PLoS Biology 6 (2): e27.

MacArthur, Stewart, Xiao-Yong Li, Jingyi Li, James B Brown, Hou Cheng Chu, Lucy Zeng, Brandi P Grondona, et al. 2009. “Developmental Roles of 21 Drosophila Transcription Factors Are Determined by Quantitative Differences in Binding to an Overlapping Set of Thousands of Genomic Regions.” Genome Biology 10 (7): R80.

Nelson, David L, Albert L Lehninger, and Michael M Cox. 2008. Lehninger Principles of Biochemistry. Macmillan.

Small, Stephen, Adrienne Blair, and Michael Levine. 1992. “Regulation of Even-Skipped Stripe 2 in the Drosophila Embryo.” The EMBO Journal 11 (11): 4047–57.