Dataset Open Access

Datasets for predicting TF binding using Virtual ChIP-seq

Mehran Karimzadeh; Michael M. Hoffman

This repository contains datasets necessary for using the Virtual ChIP-seq software.

Virtual ChIP-seq requires the following datasets to predict transcription factor binding:

  • chipExpDir_AtoH_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters A-H.

  • chipExpDir_ItoZ_V1.0.0.tar.gz: Reference matrices of correlation between TF binding and gene expression for TFs starting with letters I-Z.

  • refTables_V1.1.0.tar.gz: PhastCons genomic conservation, FIMO PWM scores for JASPAR motifs, and ChIP-seq data of ENCODE and Cistrome database.

  • hg38_chrsize.tsv: Length of chromosomes in hg38

  • trainedModels_V1.0.0.tar.gz: Virtual ChIP-seq scikit-learn trained models saved in joblib format

  • <CellType>.tar.gz: Pre-calculated matrices suitable for training with other algorithms or re-training with Virtual ChIP-seq.

Some predictive features of TF binding are the same in each cell type and are stored together for simplicity in refTables_V1.0.0.tar.gz. You can use datasets from other cell types (named here as  <CellType>.tar.gz) for the purpose of re-training the model. The <CellType>.tar.gz files contain pre-calculated predictive features of transcription factor binding in 4 chromosomes (5, 10, 15, 20).

These features include:

  • PhastCons genomic conservation

  • FIMO score for sequence motifs of TF in the JASPAR database

  • Chromatin accessibility

  • TF binding in ENCODE + Cistrome DB datasets

  • Virtual ChIP-seq expression score

 

Files (127.7 GB)
Name Size
A549.tar.gz
md5:b94ddbdd23c9c3f3bf19da32a1c81386
5.6 GB Download
BJ.tar.gz
md5:8d32e610ddaaa3dc51c1160a4412854a
576.8 MB Download
chipExpDir_AtoH_V1.0.0.tar.gz
md5:73cd1b537594d6c5292bcf565bed53b5
28.3 GB Download
chipExpDir_ItoZ_V1.0.0.tar.gz
md5:1cab4d28f8b19590f0d54df34eabc5f7
28.6 GB Download
GM12878.tar.gz
md5:fd571b8c85e838c1e0376473ec64377f
8.0 GB Download
H1.tar.gz
md5:1b58bcb51013ddc0af22f8e675348404
5.7 GB Download
HCT-116.tar.gz
md5:77790b28dbe2d26a2b9d794dff8b0007
2.7 GB Download
HeLa-S3.tar.gz
md5:a0865de52384e0bbc1469d122b41d12d
5.2 GB Download
HepG2.tar.gz
md5:545563edd74c082b23d2560b8a71431f
7.3 GB Download
hg38_chrsize.tsv
md5:50e491e8a8e9b0019a6c15c5d9a890ff
365 Bytes Download
hg38_EncodeBlackListedRegions_200bpBins.bed.gz
md5:69bee0c45cdbb862b236c396cedb8a6e
1.1 kB Download
IMR-90.tar.gz
md5:501962ea3cb66864a67f500fde72ef63
9.7 GB Download
Ishikawa.tar.gz
md5:3f81f671584da330deabe726cf003ea4
1.9 GB Download
Jurkat.tar.gz
md5:2db7b0208a0bd2b550051a9eea9c322b
685.0 MB Download
K562.tar.gz
md5:2772ebf82e301b61cbf12b339d346416
8.8 GB Download
Liver.tar.gz
md5:0031fd1d55911a5f47da2a7d4fc13d39
1.5 GB Download
LNCaP.tar.gz
md5:d5eb27535bdd55ebbd654551fe36a3c7
1.9 GB Download
MCF-7.tar.gz
md5:67d5243c981b7c2a97360b9495dd78ec
6.4 GB Download
NHEK.tar.gz
md5:6844458b2fe22225b5f4118f84d52580
702.4 MB Download
PANC-1.tar.gz
md5:78f721adbe6c570394cdd86c97c95b23
559.6 MB Download
Raji.tar.gz
md5:47a5ae471179ddc52992df6129d3a947
136.9 MB Download
refTables_V1.0.0.tar.gz
md5:5ecabb174c6f6bc5eae70a6e31b18859
2.3 GB Download
T47D.tar.gz
md5:4d2cc333adc6ccba713f4b4d0253f4e1
987.2 MB Download
trainedModels_V1.0.0.tar.gz
md5:e3a24d6c6e0428f4036ed58ca6f9735f
70.3 MB Download
94
123
views
downloads
All versions This version
Views 9494
Downloads 123123
Data volume 1.0 TB1.0 TB
Unique views 8989
Unique downloads 4343

Share

Cite as