Published September 12, 2025 | Version v1
Dataset Open

RNA-Sequencing Part 2 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55

  • 1. ROR icon Hospital for Sick Children

Description

Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.

This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file). 

1. Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This second part contains the data for the exercised mice, including the fastq (R1 and R2) and associated (md5) files for the 2 DMD mice (15321 and 15322) and 4 WT mice (2699, 2700, 15323, 15324). Raw gene counts (.txt files) and differentially expressed genes (tsv files) are in Part 1.

Workflow (performed by TCAG at SickKids): 

2. RNA-Seq Library and Reference Genome Information

Type of library: stranded, paired end

Genome reference sequence: GRCm39, M31 Gencode gene models.

 

3. Read Pre-processing, Alignment and Obtaining Gene Counts

3.1 Read Pre-processing

The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

 

Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:

            -q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;

            --clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;

            --stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;

            --length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;

            --paired – only pairs of reads are retained (for paired-end reads only, not for single reads).

The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.

 

The quality of the trimmed reads is re-assessed with FastQC.

 

The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).

 

To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.

 

3.2. Read Alignment

The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537).  The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).

 

3.3. Obtaining Gene Counts

The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.

 

4. Pre-processing, Alignment and Gene Counts QC

MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC,  FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.

 

 5. DGE Analysis with edgeR

Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are notexpressed, or expressed at a very low level.

 

The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.

 

EdgeR Results Legend:

·      GeneID – Ensembl Gene ID;

·      Chr.Start.End - gene coordinates;

·      GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;

·      logFC - Log2 Fold Change (use this column for selection of DEGs);

·      logCPM - Log2 Counts Per Million, average for all libraries;

·      LR – Statistic calculated by the LR-Test;

·      PValue - Differential expression P value;

·      FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);

·      (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.

Files

Files (45.1 GB)

Name Size Download all
md5:8ae4bb93f0a084b0d41522907db3f06c
3.6 GB Download
md5:6bc77c50e1b8c67d235ed7365114c31d
69 Bytes Download
md5:8d5b958ebd0c1c32163552b448874071
3.7 GB Download
md5:5d86d29afcf7a30344f0f4cbb60eea0a
69 Bytes Download
md5:51d0dd4ea9d66473d84f1cebe46ab967
4.6 GB Download
md5:aa914f6f210bc2129f1f5cc84d5e4690
69 Bytes Download
md5:75caad2e10c8ee07579ef6372ac32c0a
4.8 GB Download
md5:edd1d5a075882622e4b4c51c033396fd
69 Bytes Download
md5:d12534da673c4c6012da87ec8458e419
3.4 GB Download
md5:eee0b49e8daf13fd94326be20ab17566
67 Bytes Download
md5:14e33cddcd3dae75592f8a1ac9da867c
3.5 GB Download
md5:2f623ea7ed0c504767661e21c2476bbc
67 Bytes Download
md5:f5589773af47613f167245d37f2ef942
3.6 GB Download
md5:793f8431645dd9668050549cdb9dda51
67 Bytes Download
md5:614b1d138ef227b99c1afa1eed5acb1c
3.9 GB Download
md5:75508ee597ad7f76fc67c39bfc571690
67 Bytes Download
md5:cbb4c66f2392e6a7a2fedeb646ce0fdd
3.4 GB Download
md5:12457ad2999579f6eeacc40c42365a5c
66 Bytes Download
md5:c80e82958369ed79c8c8f9eff618e3a0
3.6 GB Download
md5:ac24bbcdc567d91bbd068ac5c752aa32
66 Bytes Download
md5:7744a2dad1239888d76774c4c7a63511
3.3 GB Download
md5:f535104194027a279aff79995e183315
66 Bytes Download
md5:e79acd41ea72fa23cd084de3f323946e
3.5 GB Download
md5:1f935faaf41957551bcfd45be828e2c6
66 Bytes Download

Additional details

Identifiers

Related works

Is described by
Journal article: 39099311 (PMID)

Funding

Canadian Institutes of Health Research
6210100686