Published November 13, 2025 | Version v3
Journal Open

A deep learning models to predict silencers and enhancers in the DLPFCs

Authors/Creators

Description

We developed a deep learning framework that combines bulk and single-cell epigenomic data to evaluate the regulatory potential of noncoding AD variants (i.e., silencing and activating strength) in the dorsolateral prefrontal cortex (DLPFC) and its major cell types.

A) README.docx

B) The AZzip.zip package includes

  1. all data used for training,

  2. the trained model,

  3. the programs as needed,

  4. Example datasets for using the programs.

C) The alternativeModels.tar.gz package includes

 1. CNNtran** files are for model A; 

 2. CNNlarge** files are for model B; 

 3. CNNsmall** files are for model C.

 **_single_model.hdf5 is the file for the structure of a model. 

 **_model_weights.hdf5 is the file for the weights of a trained model. 

 

Command examples

1)  Data and phase-one model files

a.     pos.bed and allBK.bed are positive and control samples for training.

b.     phase_one_model.hdf5 and phase_one_weights.hdf5 are the built phase-one model structure and its weights.

 

2)  Predicting using the phase-one model.

Example: python phase_one_data_AZpublic.py normal.temp.bed BK.temp.bed hg38.fa

(Please download the genome sequence from https://hgdownload.gi.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz and unzip the file.)

      The output is the input for training/using phase-two models.

 

3) Training a phase-two model using the files in the directory of “data”

Example: python two_phase.trainENSL_AZpublic.py

(using the ./example/normal.temp.bed.phase_one.hdf5, the trained model will be stored in ./results.tmp/)

 

4)  Predicting allele-specific regulatory effect of SNVs

a)     Output of the phase-one model

Example: python phase_one_data_SNP.py SNP.bed hg38.fa

b)    Output of the phase-two model

Example: python two_phase.pred.SNV.py ./modelCNN SNPs.bed.phase_one.pred.hdf5 SNPs.bed.modelCNN.output.hdf5

 

5) For an input sequence, the output is a 9*3 array.

Each row of this array represents the silencer/enhancer predictions on the DLPFC or one cell type.

a)     The 1st column represents the probability of being a silencer,

b)    The 3rd column gives the probability of being an enhancer,

c)     The 2nd column represents the probability of being nonfunctional,

d)    The rows are the predictions for DLPFC, astrocyte, endothelial cell, excitatory neuron, immunity cell, inhibitory neuron, microglia, oligodendrocyte precursor cell, oligodendrocyte.

 

Architectures of deep learning models

The phase-two model in the DLPFC TREDNet consists of two convolutional layers and three fully-connected layers arranged sequentially. The details of this model are:

1.     1-dimensional (1D) convolutional layer with 64 kernels, each having a window size of four and a step size of one.

2.     Maxpooling layer with a window size of three and a step size of two.

3.     Dropout layer with a dropout proportion of 0.2.

4.     1D convolutional layer 128 kernels, each having a window size of three and a step size of one.

5.     Dropout layer with a dropout proportion of 0.2.

6.     Fully connected layer of 100 neurons with the sigmoid activation function.

7.     Fully connected layer of 50 neurons with the sigmoid activation function.

8.     For each cellular contexts, a fully connected output layer of 3 neurons with the SoftMax activation function. Nine cellular contexts include DLPFC, astrocyte, endothelia, inhibitory neuron, excitatory neuron, immune cell, microglia, oligodendrocyte, and oligodendrocyte progenitor.

 

We compared this model with three alternatives – (A) a CNN-transformer hybrid model with a comparable number of trainable parameters to the original DLPFC TREDNet, (B) a CNN model with approximately three times more parameters and (C) a CNN model with three times fewer parameters.

The details of the model (A) are:

1.     1-dimensional (1D) convolutional layer with 64 kernels, each having a window size of four and a step size of one.

2.     Maxpooling layer with a window size of three and a step size of two.

3.     Dropout layer with a dropout proportion of 0.2.

4.     1D convolutional layer 128 kernels, each having a window size of three and a step size of one.

5.     Dropout layer with a dropout proportion of 0.2.

6.     Transformer encoder with four attention heads and a 128-node forward layer.

7.     Transformer encoder with four attention heads and a 128-node forward layer.

8.     Transformer encoder with four attention heads and a 128-node forward layer.

9.     Transformer encoder with four attention heads and a 128-node forward layer.

10.  Global average layer.

11.  Fully connected layer of 50 neurons with the sigmoid activation function.

12.  For each cellular contexts, a fully connected output layer of 3 neurons with the SoftMax activation function. Eight cell contexts include DLPFC, astrocyte, endothelia, inhibitory neuron, excitatory neuron, microglia, oligodendrocyte, and oligodendrocyte progenitor cell.

 

The details of the model (B) are:

1.     1-dimensional (1D) convolutional layer with 962 kernels, each having a window size of four and a step size of one.

2.     Maxpooling layer with a window size of three and a step size of two.

3.     Dropout layer with a dropout proportion of 0.2.

4.     1D convolutional layer 360 kernels, each having a window size of three and a step size of one.

5.     Dropout layer with a dropout proportion of 0.2.

6.     Fully connected layer of 100 neurons with the sigmoid activation function.

7.     Fully connected layer of 50 neurons with the sigmoid activation function.

8.     For each cellular contexts, a fully connected output layer of 3 neurons with the SoftMax activation function. Eight cell contexts include DLPFC, astrocyte, endothelia, inhibitory neuron, excitatory neuron, microglia, oligodendrocyte, and oligodendrocyte progenitor cell.

 

The details of the model (C) are:

1.     1-dimensional (1D) convolutional layer with 64 kernels, each having a window size of four and a step size of one.

2.     Maxpooling layer with a window size of three and a step size of two.

3.     Dropout layer with a dropout proportion of 0.2.

4.     1D convolutional layer 128 kernels, each having a window size of three and a step size of one.

5.     Dropout layer with a dropout proportion of 0.2.

6.     Fully connected layer of 50 neurons with the sigmoid activation function.

7.     Fully connected layer of 25 neurons with the sigmoid activation function.

8.     For each cellular contexts, a fully connected output layer of 3 neurons with the SoftMax activation function. Eight cell contexts include DLPFC, astrocyte, endothelia, inhibitory neuron, excitatory neuron, microglia, oligodendrocyte, and oligodendrocyte progenitor cell. 

We used the Rectified Linear Unit (ReLU) activation function in the convolutional and transformer layers. In the convolutional, transformer, fully connected layers, the penalty coefficients of L1 and L2 regularizations were  and , respectively, and the max weight constraint of the parameters in a kernel or neuron was 0.9.

Files

AZzip.zip

Files (2.0 GB)

Name Size Download all
md5:6a74c9f1ff2b1c24f78046f22d9c42a3
595.4 MB Download
md5:e23f0397f92dbd7c5ff3d91a1e4eda3c
1.4 GB Preview Download
md5:057b1f02fcb32e07a62aed84333ce867
22.7 kB Download