Dataset Open Access

Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data

Christian H. Holland; Jovan Tanevski; Javier Perales-Patón; Jan Gleixner; Manu P. Kumar; Elisabetta Mereu; Brian A. Joughin; Oliver Stegle; Douglas A. Lauffenburger; Holger Heyn; Bence Szalai; Julio Saez-Rodriguez


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">scRNA-seq</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">functional analysis</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">transcription factor analysis</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">pathway analysis</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">benchmark</subfield>
  </datafield>
  <controlfield tag="005">20200214122653.0</controlfield>
  <controlfield tag="001">3564179</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, Bioquant - Im Neuenheimer Feld 267, 69120 Heidelberg, Germany</subfield>
    <subfield code="a">Jovan Tanevski</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, Bioquant - Im Neuenheimer Feld 267, 69120 Heidelberg, Germany</subfield>
    <subfield code="a">Javier Perales-Patón</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany</subfield>
    <subfield code="a">Jan Gleixner</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Biological Engineering, MIT, Cambridge MA</subfield>
    <subfield code="a">Manu P. Kumar</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain</subfield>
    <subfield code="a">Elisabetta Mereu</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Biological Engineering, MIT, Cambridge MA</subfield>
    <subfield code="a">Brian A. Joughin</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany</subfield>
    <subfield code="a">Oliver Stegle</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Biological Engineering, MIT, Cambridge MA</subfield>
    <subfield code="a">Douglas A. Lauffenburger</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain</subfield>
    <subfield code="a">Holger Heyn</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Semmelweis University, Faculty of Medicine, Department of Physiology, Budapest, Hungary</subfield>
    <subfield code="a">Bence Szalai</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, Bioquant - Im Neuenheimer Feld 267, 69120 Heidelberg, Germany</subfield>
    <subfield code="0">(orcid)0000-0002-8552-8976</subfield>
    <subfield code="a">Julio Saez-Rodriguez</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">5485871322</subfield>
    <subfield code="z">md5:a2f6387a668c204d61b5e47c402e745d</subfield>
    <subfield code="u">https://zenodo.org/record/3564179/files/data.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">5378455338</subfield>
    <subfield code="z">md5:d86f800b0f9ffae88858405e7517d4ba</subfield>
    <subfield code="u">https://zenodo.org/record/3564179/files/output.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-12-10</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:3564179</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, Bioquant - Im Neuenheimer Feld 267, 69120 Heidelberg, Germany</subfield>
    <subfield code="0">(orcid)0000-0002-3060-5786</subfield>
    <subfield code="a">Christian H. Holland</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Robustness and applicability of  transcription factor and pathway analysis tools on single-cell RNA-seq data</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Data used to test the robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data, described in &lt;a href="https://doi.org/10.1186/s13059-020-1949-z"&gt;Holland et al. 2020&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The folder&amp;nbsp;&lt;em&gt;data &lt;/em&gt;contains&lt;em&gt;&amp;nbsp;&lt;/em&gt;raw data and the folder &lt;em&gt;output&lt;/em&gt; contains intermediate and final results of all analyses.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;The associated analyses code and more information are available on&amp;nbsp;&lt;a href="https://github.com/saezlab/FootprintMethods_on_scRNAseq"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Abstract&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;For questions related to the data please write an email to christian.holland@bioquant.uni-heidelberg.de or use the &lt;a href="https://github.com/saezlab/FootprintMethods_on_scRNAseq/issues"&gt;GitHub issue system&lt;/a&gt;.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3564178</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3564179</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
997
2,155
views
downloads
All versions This version
Views 997997
Downloads 2,1552,155
Data volume 11.6 TB11.6 TB
Unique views 918918
Unique downloads 533533

Share

Cite as