There is a newer version of the record available.

Published May 1, 2024 | Version v1
Dataset Open

Supplementary Material for the paper entitled "Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data"

  • 1. ROR icon Microsoft (United States)
  • 2. ROR icon Brown University

Description

This repo contain supplementary tables from the manuscript entitled: "Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data". Clustering is a common way to identify cell types in single-cell RNA-sequencing (scRNA-seq) data. Unfortunately, current methods (i) require users to make human-in-the-loop decisions, which adds significant runtime to bioinformatic analyses, and (ii) reuse the same data twice when testing for differentially expressed genes, which can lead to an increased number of false discoveries. In this work, we overcome these limitations with NCLUSION: a Bayesian nonparametric method that simultaneously clusters cells and selects marker genes. NCLUSION operates without user-defined heuristics to set model parameters and leverages variational expectation-maximization (EM) for posterior inference which allows it to scale well up to 1 million cells. By analyzing publicly available datasets, we illustrate that NCLUSION matches the state-of-the-art clustering performance of competing approaches, achieves improved computational efficiency, and directly enables identification of biologically relevant gene sets driving cluster definitions.

Files

Files (6.0 MB)

Name Size Download all
md5:d5f37b78101dcf2a256cd7cc30f0a1ab
48.5 kB Download
md5:e41ff01c03844535364e6670953def3c
5.2 MB Download
md5:42a272769cf0d50cae95bbc89c336045
61.2 kB Download
md5:763b7ebf010f46304e9c65ea836bae87
725.3 kB Download

Additional details

Funding

David and Lucile Packard Foundation

Software

Repository URL
https://github.com/microsoft/Nclusion.jl
Programming language
Julia

References

  • C. Nwizu, M. Hughes, M.L. Ramseier, A. Navia, A.K. Shalek, N. Fusi, S. Raghavan, P.S. Winter, A.P. Amini, and L. Crawford. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. bioRxiv. 2024.02.11.579839.