Published December 17, 2024 | Version v1
Dataset Open

Assigning transcriptomic subtypes to CLL samples using nanopore RNA-sequencing and self-organizing maps - dataset

  • 1. Institute of Molecular Biology NAS RA

Description

The dataset contains raw and intermediated files, and scripts required to reproduce the results associated with the manuscript "Assigning transcriptomic subtypes to CLL samples using nanopore RNA-sequencing and self-organizing maps". Here, we demonstrate that integrating publicly available short-read data with in-house generated ONT data, along with the application of machine learning approaches, enables the characterization of the CLL transcriptome landscape, the identification of clinically relevant molecular subtypes, and the assignment of these subtypes to nanopore-sequenced samples.

-------------------------------------------------------------------------------------------------------------------------------------------

ONT_Projection_paper.zip archive contains scripts and data used to generate the results for the initial submission of the paper. 

The content of the data archive is following:

Scripts

Projection_CML_CLL_ONT.Rproj - project workspace and metadata about available files and datasets.

CLL_ONT_4_pub.Rmd -  R Markdown file with complete analysis workflow. It includes scripts for data conversion and analysis. 

test_SVM_ONT.r - R script for supervised projection of ONT sequencing data on CLL map SOM landscape and assigning transcriptome subtypes. 

phenomap.R - R script for generation of phenotype maps.

SOM2jpeg.R - R script for saving SOM portrait image. 

assign_SOM_class.R - R script for assignment transcriptome subtypes to nanopore sequencing samples.

 

Raw Data

ONT_exp_matrix_w_samplenames.csv - raw count matrix of nanopore sequencing samples 

Sample_metadata.csv - nanopore sequencing sample metadata

cllmap_rnaseq_tpms_full.csv - tmp value matrix of CLL Map Project [R1]

cllmap_participants.csv - metadata of CLL Map Project

 

Intermediate files

CLL_MAP_Knisbacher_2022.Rdata - Rdata object with CLL Map tmp matrix and metadata

CLL_MAP_Knisbacher_2022_adj.Rdata - Rdata object with CLL Map batch corrected tmp matrix and metadata

ont_merged_counts.Rdata - Rdata object with raw count matrix of nanopore sequencing samples

bmTable.Rdata - Rdata with ENSEMBL to Gene Official Symbol conversion table

CLL-ONT.Rdata - Rdata object with tpm value matrix (Gene Symbols as row names) of nanopore sequencing samples

metadata.pred - Folder with supSOM image of ONT samples

mean.m.tr.pred - Folder with group SOM images of CLL map transcriptomic subtypes.

 

Result files

results.CLLMAPadj_overExp_2 - Results - Folder with the results of CLL map transcriptomic portrayal using oposSOM pipeline [R2]. 

all_significant_GS.csv - Functional annotation of SOM gene modules (spots). This file contains significant (FDR-adjusted) gene sets.

specific_GS.csv - Functional annotation of SOM gene modules (spots). This file contains significant (FDR-adjusted) gene sets specific for a given spot.

 

References

R1. CLL-map Portal. https://cllmap.org/. Last accessed December 20, 2024

R2. Löffler-Wirth H, Kalcher M, Binder H. oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor. Bioinformatics. 2015 Oct 1;31(19):3225-7. doi: 10.1093/bioinformatics/btv342. Epub 2015 Jun 10. PMID: 26063839.

---------------------------------------------------------------------------------------------------------------------------------------------

ONT_Projection_paper_revision.zip folder contains additional scripts created in response to the Reviewers' comments during the first round of revisions. 

CLL_ONT_revision.Rmd - An R Markdown file containing revision-related scripts. 

hr_table_ffs.csv and hr_table_os.csv- Hazard ratio tables from the multivariable Cox regression model for failure-free survival and overall survival, with PAT types, gender, and spot I as independent variables.

 

 

 

 

 

 

 

Files

ONT_Projection_paper.zip

Files (1.8 GB)

Name Size Download all
md5:a4b24e9a28a119e4e967fd2b2d07af9d
1.8 GB Preview Download
md5:b35e74aa2dd6b004d419b1df88268658
5.1 kB Preview Download