RNA-Sequencing Part 3 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55

Perillat, Lucie

doi:10.5281/zenodo.17095147

Published September 10, 2025 | Version v1

Dataset Open

RNA-Sequencing Part 3 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55

Perillat, Lucie (Researcher)¹

1. Hospital for Sick Children

Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.

This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).

Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This third part contains the data for the non-exercised mice, including the fastq (R1 and R2) that were extracted from alignment index files (bam - see below), and the differentially expressed genes (tsv files). Fastq files were extracted by our team from the alignment indexes (bam) files, as follows:

1. Starting with the original file (Number.Aligned.sortedByCoord.out.bam), using samtools, we sorted by name:

samtools sort -n Number.Aligned.sortedByCoord.out.bam -o Number.Aligned.namesorted.bam

2. We extracted the paired reads into 2 separate files for R1 and R2, and any singleton or orphaned reads into additional RS and R0 files, respectively (many of the RS and R0 files were empty and not added here due to size constraints):

samtools fastq -1 Number_R1.fastq -2 Number_R2.fastq -0 Number_R0.fastq -s Number_RS.fastq

3. We compressed all of the files into ‘.gz’ extension using gzip:

gzip -9 Number_R1.fastq

.bam and RS/R0 files were not added due to size constraints but were available upon request.

Upstream workflow performed by TCAG (SickKids):

2. RNA-Seq Library and Reference Genome Information

Type of library: stranded, paired end

Genome reference sequence: GRCm39, M31 Gencode gene models.

3. Read Pre-processing, Alignment and Obtaining Gene Counts

3.1 Read Pre-processing

The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:

-q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;

--clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;

--stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;

--length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;

--paired – only pairs of reads are retained (for paired-end reads only, not for single reads).

The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.

The quality of the trimmed reads is re-assessed with FastQC.

The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).

To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.

3.2. Read Alignment

The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).

3.3. Obtaining Gene Counts

The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.

4. Pre-processing, Alignment and Gene Counts QC

MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.

5. DGE Analysis with edgeR

Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are not expressed, or expressed at a very low level.

The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.

EdgeR Results Legend:

· GeneID – Ensembl Gene ID;

· Chr.Start.End - gene coordinates;

· GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;

· logFC - Log2 Fold Change (use this column for selection of DEGs);

· logCPM - Log2 Counts Per Million, average for all libraries;

· LR – Statistic calculated by the LR-Test;

· PValue - Differential expression P value;

· FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);

· (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.

Files

Files (41.4 GB)

Name	Size	Download all
BMD_6251_R1.fastq.gz md5:df806fc66df2f03f40abcfd06c066905	2.0 GB	Download
BMD_6251_R2.fastq.gz md5:74706d95c907527d3b27b72043a85a21	2.0 GB	Download
BMD_6252_R1.fastq.gz md5:9526c4e2fa834e982215f7ac42fec7fc	1.9 GB	Download
BMD_6252_R2.fastq.gz md5:be55dda03b5060b3bc60f26d0bc541cb	2.0 GB	Download
BMD_6260_R1.fastq.gz md5:f6b57b71605d52291afa79fe3c4de4fe	666.9 MB	Download
BMD_6260_R2.fastq.gz md5:35a549de6d36caf3e2349c36de6f33a2	2.3 GB	Download
BMD_6378_R1.fastq.gz md5:0a31630c76cc2f3bced01d1eec799fdb	2.1 GB	Download
BMD_6378_R2.fastq.gz md5:a636aa8c5a780e6a95f4beec8a7d790d	2.1 GB	Download
BMD_noEx_vs_DMD_noEx_DifExp_FPKMs_edgeR.tsv md5:2d86e3097831b2ce7135017b13f28afd	3.6 MB	Download
DMD_3976_R1.fastq.gz md5:dcc53d5122bd1f2b4240c684116fafa5	1.2 GB	Download
DMD_3976_R2.fastq.gz md5:96f76e0196ee0b9b04a3db59f392d1ae	653.1 MB	Download
DMD_3981_R1.fastq.gz md5:35381f47816279a8c2a50662777ac9ba	2.5 GB	Download
DMD_3981_R2.fastq.gz md5:8cbd439dc2a781f5b51b80ba7072b0ed	1.2 GB	Download
DMD_6098_R1.fastq.gz md5:0ac45790d713dd613b50a977984c2a1b	1.2 GB	Download
DMD_6098_R2.fastq.gz md5:ef988bd5d27ba3b2598b56f073fd3963	1.2 GB	Download
DMD_7880_R1.fastq.gz md5:1d2183bb30ed89765127edce08eee560	1.6 GB	Download
DMD_7880_R2.fastq.gz md5:d9eb7db4471761f5f716170644aa4cfd	1.6 GB	Download
WT_7415_R1.fastq.gz md5:f256c67db8f1ce32206832e6476d15ed	2.6 GB	Download
WT_7415_R2.fastq.gz md5:9d0388bb15219f2c5a8abbe076e36f18	2.6 GB	Download
WT_7886_R1.fastq.gz md5:6fd163331c35494543915234932e8dbb	1.7 GB	Download
WT_7886_R2.fastq.gz md5:9b2c0da7c1d40e69ecd0d011be608a69	1.7 GB	Download
WT_7887_R1.fastq.gz md5:0979555c529da2dc51a5a5a3f1934482	1.6 GB	Download
WT_7887_R2.fastq.gz md5:bf8438cfa3e78a4f09302df90f5cec9c	1.6 GB	Download
WT_7888_R1.fastq.gz md5:a1b0afc6c802222e3c39abf3efc12193	1.7 GB	Download
WT_7888_R2.fastq.gz md5:2cbc0d0e8c2e5fb9efb978de0e865961	1.7 GB	Download
WT_noEx_vs_BMD_noEx_DifExp_FPKMs_edgeR.tsv md5:32b3983cadd088bc0a75033b4fe3ebdf	3.5 MB	Download
WT_noEx_vs_DMD_noEx_DifExp_FPKMs_edgeR.tsv md5:68178fa1c7d73caf288c5e44d44cb425	3.5 MB	Download

Additional details

PMID: 39099311

Is described by: Journal article: 39099311 (PMID)

Canadian Institutes of Health Research
6210100686

	All versions	This version
Views	28	28
Downloads	396	396
Data volume	610.9 GB	610.9 GB

Files (41.4 GB)

Identifiers

Related works

Funding

RNA-Sequencing Part 3 Generation and characterization of a novel mouse model of Becker Muscular Dystrophy with a deletion of exons 52 to 55

Authors/Creators

Description

Files

Files (41.4 GB)

Additional details

Identifiers

Related works

Funding