RNAseq of Mus musculus lung tissue
Authors/Creators
Description
RNAseq was performed on mouse lung tissue samples to assess the effects of antibiotic treatment on airway gene transcriptional programs. The effect of antibiotics treatment was primarily investigated in conjunction with the age of treatment onset, with some mice starting treatment at 3-weeks of old (young) while others started at 8-weeks old (old). Secondly, comparisons were made following cohousing of the mice (to normalise the gut microbiome) following their course of antibiotics. These experiments are referred to as "Abx-treated" and "Cohoused" in the accompanying metadata file.
The RNA sequencing raw fastq files have been processed using the NF-CORE rnaseq pipeline. Briefly, reads undergo initial QC with FastQC, UMIs are extracted with UMI-tools, and reads undergo adapter removal and quality trimming using Trim Galore! Next, genomic contaminants are removed via BBSplit, and ribosomal components are removed via SortMeRNA. Reads are aligned and quantified via a combination of STAR and Salmon tools, and sorted and indexed using SAMtools and dereplicated via UMI-tools. Outputs are provided in .rds file format as SummarizedExperiment objects, with bias-corrected gene counts without an offset (perdijk_abx_study_2024_RNA.gene_counts_length_scaled.rds). A .csv file of the corrected counts table has also been provided (perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv).
There are two matrices provided in the .rds file: counts and abundance.
- The abundance matrix is the scaled and normalised transcripts per million (TPM) abundance. TPM explicitly erases information about library size. That is, it estimates the relative abundance of each transcript proportional to the total population of transcripts sampled in the experiment. Thus, you can imagine TPM, in a way, as a partition of unity — we want to assign a fraction of the total expression (whatever that may be) to transcript, regardless of whether our library is 10M fragments or 100M fragments.
- The counts matrix is a re-estimated counts table that aims to provide count-level data to be compatible with downstream tools such as DESeq2.
- The tximport package has a single function for importing transcript-level estimates. The type argument is used to specify what software was used for estimation. A simple list with matrices, "abundance", "counts", and "length", is returned, where the transcript level information is summarized to the gene-level. Typically, abundance is provided by the quantification tools as TPM (transcripts-per-million), while the counts are estimated counts (possibly fractional), and the "length" matrix contains the effective gene lengths. The "length" matrix can be used to generate an offset matrix for downstream gene-level differential analysis of count matrices.
Files
perdijk_abx_study_2024_RNA.gene_counts_length_scaled.csv
Additional details
Additional titles
- Subtitle
- NF-CORE/rnaseq pipeline outputs