Published October 1, 2023 | Version v0.2.0
Dataset Open

Data archive: CICT for single cell RNA-seq network inference

  • 1. New York University

Description

This archive contains benchmarking input data and results for using single cell gene expression data to infer gene regulatory networks (GRN) by the Causal Inference with Composition of Transactions (CICT) method and a selected set of published methods. This accompanies the manuscript "Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions" (Shojaee and Huang, Brief in Bioinform 2023. DOI: 10.1093/bib/bbad370). The CICT code is available at the GitHub repo (https://github.com/hlab1/scRNAseqWithCICT/).

The original CICT algorithm was described in Shojaee et al. (arXiv:1608.02658, 2016). The benchmarked methods were included in the BEELINE benchmarking pipeline (Pratapa et al., Nat Methods 2020), to which we added DEEPDRIM (Chen et al., Brief Bioinform 2021), SCENIC (Aibar et al., Nat Methods 2017), Inferelator 3.0 (Gibbs et al., Bioinformatics 2022), and CellOracle (Kamimoto et al., Nature 2023). The output directory names are (subdirectories within each dataset):

* CICT_ewMIshrink_RFmaxdepth10_RFntrees20/: CICT for simulated data
* CICT_v2/: CICT for experimental data
* CELLORACLEDB/: CellOracle for experimental data
* DEEPDRIM72_ewMIshrink_RFmaxdepth10_RFntrees20/: DEEPDRIM for simulated data
* DEEPDRIM72_v2/: DEEPDRIM for experimental data
* INFERELATOR38_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-Prior for simulated data
* INFERELATOR38_v2/: Inferelator-Prior for experimental data
* INFERELATOR34_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-NoPrior for experimental data
* INFERELATOR34_v2/: Inferelator-NoPrior for experimental data
* GENIE3/: GENIE3
* GRNBOOST2/: GRNBOST2
* LEAP/: LEAP
* PIDC/: PIDC
* PPCOR/: PPCOR
* SCENICDB/: SCENIC for experimental data
* SCNS/: SCNS
* SCODE/: SCODE
* SCRIBE/: SCRIBE
* SINCERITIES/: SINCERITIES
* SINGE/: SINGE
* RANDOM/: RANDOM

The methods were benchmarked against two kinds of scRNA-seq datasets:
* Simulated datasets produced by the SERGIO simulator from a synthetic network (Dibaeinia et al., Cell Systems 2020), including complete datasets and datasets with dropouts with shape parameter k=6.5 and rate parameter q=10, 30, 50, 70, 80. 
* Experimental datasets compiled by the BEELINE pipeline, evaluated at three different levels L0, L1 and L2, with three types of ground truth networks.
    * Evaluation levels:
        * L0: 500 highly varying genes plus TFs
        * L1: 1000 highly varying genes plus TFs
        * L2: 500 highly varying genes, TFs and 500 genes randomly selected that excluded the 1000 highly varying genes from L1.
    * Types of ground truths:
        * Cell-type-specific ChIP-seq ground truth (L0, L1, L2)
        * Non-specific ChIP-seq ground truth (L0_ns, L1_ns, L2_ns)
        * Loss-of-function/gain-of-function ground truth (L0_lofgof, L1_lofgof, L2_lofgof)

The directory structure is organized in accordance with the BEELINE benchmarking pipeline. For complete details please please see the BEELINE documentation (https://murali-group.github.io/Beeline/) and Github repo (https://github.com/Murali-group/Beeline).

 

Files

beeline_eval_summary11_2023-08-25-CICT_benchmark_archive.zip

Files (17.0 GB)

Additional details

Related works

Is published in
Journal article: 10.1093/bib/bbad370 (DOI)

Funding

Dissecting natural variation in transcription factor - DNA interactions 1R35GM138143-01
National Institutes of Health

Dates

Accepted
2023-09-29