Published November 14, 2021 | Version v2
Journal article Open

Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis

  • 1. University of Bordeaux, INSERM Bordeaux Population Health Research Center, INRIA SISTM, Vaccine Research Institute
  • 2. RAND Corporation

Description

State-of-the-art methods for single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis (DEA) often rely on strong distributional assumptions that are difficult to verify in practice. Furthermore, while the increasing complexity of clinical and biological single-cell studies calls for greater tool versatility, the majority of existing methods only tackle the comparison between two conditions. We propose a novel, distribution-free, and flexible approach to DEA for single-cell RNA-seq data. This new method, called ccdf, tests the association of each gene expression with one or many variables of interest (that can be either continuous or discrete), while potentially adjusting for additional covariates. To test such complex hypotheses, ccdf uses a conditional independence test relying on the conditional cumulative distribution function, estimated through multiple regressions. We provide the asymptotic distribution of the ccdf test statistic as well as a permutation test (when the number of observed cells is not sufficiently large). ccdf substantially expands the possibilities for scRNA-seq DEA studies: it obtains good statistical performance in various simulation scenarios considering complex experimental designs (i.e. beyond the two condition comparison), while retaining competitive performance with state-of-the-art methods in a two-condition benchmark. We apply \texttt{ccdf} to a large publicly available scRNA-seq dataset of 84,140 SARS-CoV-2 reactive CD8+ T cells, in order to identify the diffentially expressed genes across 3 groups of COVID-19 severity (mild, hospitalized, and ICU) while accounting for seven different cellular subpopulations.

Files

GSE153931_cd8_t24_processed_data_annotations.txt

Files (1.3 GB)

Name Size Download all
md5:c0b1634d4d06b504c914430def250bac
12.1 kB Download
md5:d0365abc9ca2276e1018bccbf80d07e3
12.9 kB Download
md5:012287af3e4b7cb68c83293401ff4948
1.2 GB Download
md5:9e694244cb00bd4b1d04a4605364d74c
2.1 kB Download
md5:89dca67f43eb8e2ec776c2ebb051f365
47.5 kB Download
md5:ec193412167138132ebb249ad1100c46
56.0 MB Preview Download
md5:ecf531f78450e95d7853ba3ba7ede868
36.4 MB Preview Download
md5:62c18bffd98a6455fae2560cdc0c1b4e
1.3 MB Download
md5:3e2f2ff7f0e12d381200e1cf0d6160d7
1.1 kB Download
md5:db6d43dc514bef2d2f230c40d536f34d
462 Bytes Download
md5:4acdcdeb668f07294a10a29ccb09258e
7.1 kB Preview Download
md5:8ae42adaea8435c5ce632564b2b17a6e
8.7 kB Download