RNA-Sequencing Part 3 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55
Description
Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.
This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).
Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This third part contains the data for the non-exercised mice, including the fastq (R1 and R2) that were extracted from alignment index files (bam - see below), and the differentially expressed genes (tsv files). Fastq files were extracted by our team from the alignment indexes (bam) files, as follows:
1. Starting with the original file (Number.Aligned.sortedByCoord.out.bam), using samtools, we sorted by name:
samtools sort -n Number.Aligned.sortedByCoord.out.bam -o Number.Aligned.namesorted.bam
2. We extracted the paired reads into 2 separate files for R1 and R2, and any singleton or orphaned reads into additional RS and R0 files, respectively (many of the RS and R0 files were empty and not added here due to size constraints):
samtools fastq -1 Number_R1.fastq -2 Number_R2.fastq -0 Number_R0.fastq -s Number_RS.fastq
3. We compressed all of the files into ‘.gz’ extension using gzip:
gzip -9 Number_R1.fastq
.bam and RS/R0 files were not added due to size constraints but were available upon request.
Upstream workflow performed by TCAG (SickKids):
2. RNA-Seq Library and Reference Genome Information
Type of library: stranded, paired end
Genome reference sequence: GRCm39, M31 Gencode gene models.
3. Read Pre-processing, Alignment and Obtaining Gene Counts
3.1 Read Pre-processing
The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:
-q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;
--clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;
--stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;
--length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;
--paired – only pairs of reads are retained (for paired-end reads only, not for single reads).
The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.
The quality of the trimmed reads is re-assessed with FastQC.
The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).
To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.
3.2. Read Alignment
The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).
3.3. Obtaining Gene Counts
The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.
4. Pre-processing, Alignment and Gene Counts QC
MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.
5. DGE Analysis with edgeR
Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are not expressed, or expressed at a very low level.
The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.
EdgeR Results Legend:
· GeneID – Ensembl Gene ID;
· Chr.Start.End - gene coordinates;
· GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;
· logFC - Log2 Fold Change (use this column for selection of DEGs);
· logCPM - Log2 Counts Per Million, average for all libraries;
· LR – Statistic calculated by the LR-Test;
· PValue - Differential expression P value;
· FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);
· (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.
Files
Files
(41.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:df806fc66df2f03f40abcfd06c066905
|
2.0 GB | Download |
|
md5:74706d95c907527d3b27b72043a85a21
|
2.0 GB | Download |
|
md5:9526c4e2fa834e982215f7ac42fec7fc
|
1.9 GB | Download |
|
md5:be55dda03b5060b3bc60f26d0bc541cb
|
2.0 GB | Download |
|
md5:f6b57b71605d52291afa79fe3c4de4fe
|
666.9 MB | Download |
|
md5:35a549de6d36caf3e2349c36de6f33a2
|
2.3 GB | Download |
|
md5:0a31630c76cc2f3bced01d1eec799fdb
|
2.1 GB | Download |
|
md5:a636aa8c5a780e6a95f4beec8a7d790d
|
2.1 GB | Download |
|
md5:2d86e3097831b2ce7135017b13f28afd
|
3.6 MB | Download |
|
md5:dcc53d5122bd1f2b4240c684116fafa5
|
1.2 GB | Download |
|
md5:96f76e0196ee0b9b04a3db59f392d1ae
|
653.1 MB | Download |
|
md5:35381f47816279a8c2a50662777ac9ba
|
2.5 GB | Download |
|
md5:8cbd439dc2a781f5b51b80ba7072b0ed
|
1.2 GB | Download |
|
md5:0ac45790d713dd613b50a977984c2a1b
|
1.2 GB | Download |
|
md5:ef988bd5d27ba3b2598b56f073fd3963
|
1.2 GB | Download |
|
md5:1d2183bb30ed89765127edce08eee560
|
1.6 GB | Download |
|
md5:d9eb7db4471761f5f716170644aa4cfd
|
1.6 GB | Download |
|
md5:f256c67db8f1ce32206832e6476d15ed
|
2.6 GB | Download |
|
md5:9d0388bb15219f2c5a8abbe076e36f18
|
2.6 GB | Download |
|
md5:6fd163331c35494543915234932e8dbb
|
1.7 GB | Download |
|
md5:9b2c0da7c1d40e69ecd0d011be608a69
|
1.7 GB | Download |
|
md5:0979555c529da2dc51a5a5a3f1934482
|
1.6 GB | Download |
|
md5:bf8438cfa3e78a4f09302df90f5cec9c
|
1.6 GB | Download |
|
md5:a1b0afc6c802222e3c39abf3efc12193
|
1.7 GB | Download |
|
md5:2cbc0d0e8c2e5fb9efb978de0e865961
|
1.7 GB | Download |
|
md5:32b3983cadd088bc0a75033b4fe3ebdf
|
3.5 MB | Download |
|
md5:68178fa1c7d73caf288c5e44d44cb425
|
3.5 MB | Download |
Additional details
Identifiers
- PMID
- 39099311
Related works
- Is described by
- Journal article: 39099311 (PMID)
Funding
- Canadian Institutes of Health Research
- 6210100686