10.1038/s41467-020-20288-9
https://zenodo.org/records/4312852
oai:zenodo.org:4312852
Cobos, F.
F.
Cobos
0000-0002-8816-9243
Ghent University
Alquicira-Hernandez, J.
J.
Alquicira-Hernandez
0000-0002-9049-7780
Garvan Institute of Medical Research
Powell, J.
J.
Powell
0000-0002-5070-4124
Garvan Institute of Medical Research
Mestdagh, P.
P.
Mestdagh
0000-0001-7821-9684
Cancer Research Institute Ghent (CRIG)
Peter, K.
K.
Peter
0000-0002-7726-5096
Ghent University
Benchmarking of cell type deconvolution pipelines for transcriptomics data
Zenodo
2020
2020-12-02
eng
https://zenodo.org/communities/ipc
https://zenodo.org/communities/eu
Creative Commons Attribution 4.0 International
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, preprocessing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semisupervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.
European Commission
10.13039/501100000780
826121
individualizedPaediatricCure: Cloud-based virtual-patient models for precision paediatric oncology