RNAseq of Mus musculus lung tissue

Perdijk, Olaf; Macowan, Matthew

doi:10.5281/zenodo.11238291

Published May 21, 2024 | Version v1

Dataset Open

RNAseq of Mus musculus lung tissue

1. Monash University

RNAseq was performed on mouse lung tissue samples to assess the effects of antibiotic treatment on airway gene transcriptional programs. The effect of antibiotics treatment was primarily investigated in conjunction with the age of treatment onset, with some mice starting treatment at 3-weeks of old (young) while others started at 8-weeks old (old). Secondly, comparisons were made following cohousing of the mice (to normalise the gut microbiome) following their course of antibiotics. These experiments are referred to as "Abx-treated" and "Cohoused" in the accompanying metadata file.

The RNA sequencing raw fastq files have been processed using the NF-CORE rnaseq pipeline. Briefly, reads undergo initial QC with FastQC, UMIs are extracted with UMI-tools, and reads undergo adapter removal and quality trimming using Trim Galore! Next, genomic contaminants are removed via BBSplit, and ribosomal components are removed via SortMeRNA. Reads are aligned and quantified via a combination of STAR and Salmon tools, and sorted and indexed using SAMtools and dereplicated via UMI-tools. Outputs are provided in .rds file format as SummarizedExperiment objects, with bias-corrected gene counts without an offset (perdijk_abx_study_2024_RNA.gene_counts_length_scaled.rds). A .csv file of the corrected counts table has also been provided (perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv).

There are two matrices provided in the .rds file: counts and abundance.

The abundance matrix is the scaled and normalised transcripts per million (TPM) abundance. TPM explicitly erases information about library size. That is, it estimates the relative abundance of each transcript proportional to the total population of transcripts sampled in the experiment. Thus, you can imagine TPM, in a way, as a partition of unity — we want to assign a fraction of the total expression (whatever that may be) to transcript, regardless of whether our library is 10M fragments or 100M fragments.
The counts matrix is a re-estimated counts table that aims to provide count-level data to be compatible with downstream tools such as DESeq2.
The tximport package has a single function for importing transcript-level estimates. The type argument is used to specify what software was used for estimation. A simple list with matrices, "abundance", "counts", and "length", is returned, where the transcript level information is summarized to the gene-level. Typically, abundance is provided by the quantification tools as TPM (transcripts-per-million), while the counts are estimated counts (possibly fractional), and the "length" matrix contains the effective gene lengths. The "length" matrix can be used to generate an offset matrix for downstream gene-level differential analysis of count matrices.

Files

perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv

Files (33.0 MB)

Name	Size	Download all
perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv md5:94ca87a71fdb845e550957dfbb8eca72	18.5 MB	Preview Download
perdijk_abx_study_2024_RNA.gene_counts_length_scaled.rds md5:01d1480ff8e3f7b9c79ca4be998dff66	14.5 MB	Download
perdijk_abx_study_2024_RNA.metadata.csv md5:1f377610d4a5f5c02b27d39f7061fc1d	2.1 kB	Preview Download

Additional details

Subtitle: NF-CORE/rnaseq pipeline outputs

	All versions	This version
Views	21	21
Downloads	32	32
Data volume	481.4 MB	481.4 MB

RNAseq of Mus musculus lung tissue

Authors/Creators

Description

Files

perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv

Files (33.0 MB)

Additional details

Additional titles