Assigning transcriptomic subtypes to CLL samples using nanopore RNA-sequencing and self-organizing maps - dataset
Description
The dataset contains raw and intermediated files, and scripts required to reproduce the results associated with the manuscript "Assigning transcriptomic subtypes to CLL samples using nanopore RNA-sequencing and self-organizing maps". Here, we demonstrate that integrating publicly available short-read data with in-house generated ONT data, along with the application of machine learning approaches, enables the characterization of the CLL transcriptome landscape, the identification of clinically relevant molecular subtypes, and the assignment of these subtypes to nanopore-sequenced samples.
-------------------------------------------------------------------------------------------------------------------------------------------
ONT_Projection_paper.zip archive contains scripts and data used to generate the results for the initial submission of the paper.
The content of the data archive is following:
Scripts
Projection_CML_CLL_ONT.Rproj - project workspace and metadata about available files and datasets.
CLL_ONT_4_pub.Rmd - R Markdown file with complete analysis workflow. It includes scripts for data conversion and analysis.
test_SVM_ONT.r - R script for supervised projection of ONT sequencing data on CLL map SOM landscape and assigning transcriptome subtypes.
phenomap.R - R script for generation of phenotype maps.
SOM2jpeg.R - R script for saving SOM portrait image.
assign_SOM_class.R - R script for assignment transcriptome subtypes to nanopore sequencing samples.
Raw Data
ONT_exp_matrix_w_samplenames.csv - raw count matrix of nanopore sequencing samples
Sample_metadata.csv - nanopore sequencing sample metadata
cllmap_rnaseq_tpms_full.csv - tmp value matrix of CLL Map Project [R1]
cllmap_participants.csv - metadata of CLL Map Project
Intermediate files
CLL_MAP_Knisbacher_2022.Rdata - Rdata object with CLL Map tmp matrix and metadata
CLL_MAP_Knisbacher_2022_adj.Rdata - Rdata object with CLL Map batch corrected tmp matrix and metadata
ont_merged_counts.Rdata - Rdata object with raw count matrix of nanopore sequencing samples
bmTable.Rdata - Rdata with ENSEMBL to Gene Official Symbol conversion table
CLL-ONT.Rdata - Rdata object with tpm value matrix (Gene Symbols as row names) of nanopore sequencing samples
metadata.pred - Folder with supSOM image of ONT samples
mean.m.tr.pred - Folder with group SOM images of CLL map transcriptomic subtypes.
Result files
results.CLLMAPadj_overExp_2 - Results - Folder with the results of CLL map transcriptomic portrayal using oposSOM pipeline [R2].
all_significant_GS.csv - Functional annotation of SOM gene modules (spots). This file contains significant (FDR-adjusted) gene sets.
specific_GS.csv - Functional annotation of SOM gene modules (spots). This file contains significant (FDR-adjusted) gene sets specific for a given spot.
References
R1. CLL-map Portal. https://cllmap.org/. Last accessed December 20, 2024
R2. Löffler-Wirth H, Kalcher M, Binder H. oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor. Bioinformatics. 2015 Oct 1;31(19):3225-7. doi: 10.1093/bioinformatics/btv342. Epub 2015 Jun 10. PMID: 26063839.
---------------------------------------------------------------------------------------------------------------------------------------------
ONT_Projection_paper_revision.zip folder contains additional scripts created in response to the Reviewers' comments during the first round of revisions.
CLL_ONT_revision.Rmd - An R Markdown file containing revision-related scripts.
hr_table_ffs.csv and hr_table_os.csv- Hazard ratio tables from the multivariable Cox regression model for failure-free survival and overall survival, with PAT types, gender, and spot I as independent variables.
Files
ONT_Projection_paper.zip
Files
(1.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a4b24e9a28a119e4e967fd2b2d07af9d
|
1.8 GB | Preview Download |
|
md5:b35e74aa2dd6b004d419b1df88268658
|
5.1 kB | Preview Download |