Summary

scDetect is a new cell type ensemble learning classification method for single-cell RNA sequencing across different data platforms, using a combination of gene expression rank-based method and majority vote ensemble machine-learning probability-based prediction method.

To further accurate predict the tumor cells in the single cell RNA-seq data, we developed scDetect-Cancer, a classification framework which incorporated the cell copy number information and epithelial origin information in the classification.

Application of scDetect

First, we load the scDetect package, and Seurat

library("scDetect")
library("Seurat")

We will work with single cell data from two human pancreas dataset. “Muraro” dataset were generated from CEL-Seq2 platform, “Xin” dataset were generated from SMARTer platform.

The count matrix and cell type lable of the test data could be obtained here.

Read the gene expression data and cell type lable.

# Xin human pancreas dataset #
xin<-counts(xin_test)
xin_lable<-xin_test$label
# Muraro human pancreas dataset #
muraro<-counts(muraro_test)
muraro_lable<-muraro_test$label

Prediction

To make scDetect easy to use, all steps were integrated into one function – scDetect.

Here, we used Muraro pancreas dataset as the training dataset to predcit the cell types in Xin pancreas dataset.

# Using Muraro dataset as the training dataset #
# Prediction #
prediction_results<-scDetect(vali_set_matrix = xin, train_set_matrix = muraro, train_set_lable = muraro_lable,p_value=0.2)

We can obtain a table showing the prediction results and detailed inforamtion.

The prediction results of scDetect included four columns:

predict_lable: Predicted cell type of the highest predict_score cell type;

predict_score: Highest predict score of the corresponding cell type;

pvalue: p value of the predict score based on the permutation analysis;

final_predict_lable: Predicted cell type based on the predict score and pvalue.

prediction_results[1:20,]

Evaluate the prediction results.

evaluate_results<-evaluate(xin_lable,prediction_results$final_predict_lable)

Accuracy of the cell type prediction results.

#Accuracy
evaluate_results$Acc

Confusion matrix of the cell type prediction results.

#Confounding matrix
evaluate_results$Conf

Application of scDetect-Cancer

For the single cell RNA-seq data of the tumor samples. First, we load the scDetect package, and Seurat

library("scDetect")
library("Seurat")

We will work with single cell data from a test melanoma dataset.

The count matrix and cell type lable of the test data could be obtained here.

Read the gene expression data and cell type lable.

# Melanoma reference dataset #
mela_ref<-counts(melanoma_ref)
mela_ref_lable<-melanoma_ref$label

# Melanoma test dataset #
mela_test<-counts(melanoma_test)

Prediction

To make scDetect-Cancer easy to use, all steps were integrated into one function – scDetect-Cancer.

Here, we used Melanoma reference dataset (without tumor cells) as the training dataset to predcit the cell types in a melanoma test dataset.

The gene position file used for single cell copy number variation analysis and gene list file used for epithelial score analysis could be obtained here.

# Create Temporary directory #
output_dir<-tempdir()

We can obtain a list included the prediction results and detailed inforamtion.

The prediction results:

scDetect_Cancer_results$lable[1:10]

The detailed inforamtion:

scDetect_Cancer_results$detail_info[1:10,]
sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7600)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936 
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.3  magrittr_1.5    tools_3.6.3     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.3      stringi_1.4.6   rmarkdown_2.1  
#>  [9] knitr_1.28      stringr_1.4.0   xfun_0.12       digest_0.6.25  
#> [13] rlang_0.4.5     evaluate_0.14