Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Jones, Emma F.; Howton, Timothy C.; Flanary, Victoria L.; Clark, Amanda D.; Lasseigne, Brittany N.

doi:10.5281/zenodo.10381745

Published December 14, 2023 | Version v1

Dataset Open

Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

1. University of Alabama at Birmingham

Contributors

Contact person:

Lasseigne, Brittany N.¹

1. University of Alabama at Birmingham

data_minus_bam.tar.gz contains all files from the data directory (except for bam outputs) associated with the 230227_EJ_MouseBrainIsoDiv GitHub project and includes the following:

- comparison_gene_lists/: The RData in the following directory contains all comparison gene lists with DGE, DTE, and DTU for importing into the R environment and reproducing analyses.

- all_comparison_gene_lists.Rdata

- cpm_out/: The RData in the following directory contains the processed counts per million and formatted metadata for downstream analyses.

- cpm_counts_metadata.RData

- deseq2_data/: All files in the following directory are Rds files with deseq2 results for the study design indicated in the file name. If the file name includes “gene” it was done at the gene level and “transcript” indicates the analysis was done at the transcript level. If a filename includes two regions, it is a comparison between the two, a file name with one region denotes either “one vs all” or “male vs female”. Any filename that includes “sex” is male vs female in the indicated region(s).

- all_regions_sex_gene_results.Rds

- all_regions_sex_transcript_results.Rds

- cerebellum_cortex_results.Rds

- cerebellum_cortex_transcripts_results.Rds

- cerebellum_gene_results.Rds

- cerebellum_hippocampus_results.Rds

- cerebellum_hippocampus_transcripts_results.Rds

- cerebellum_sex_gene_results.Rds

- cerebellum_sex_transcript_results.Rds

- cerebellum_striatum_results.Rds

- cerebellum_striatum_transcripts_results.Rds

- cerebellum_transcript_results.Rds

- cortex_gene_results.Rds

- cortex_hippocampus_results.Rds

- cortex_hippocampus_transcripts_results.Rds

- cortex_sex_gene_results.Rds

- cortex_sex_transcript_results.Rds

- cortex_striatum_results.Rds

- cortex_striatum_transcripts_results.Rds

- cortex_transcript_results.Rds

- hippocampus_gene_results.Rds

- hippocampus_sex_gene_results.Rds

- hippocampus_sex_transcript_results.Rds

- hippocampus_striatum_transcripts_results.Rds

- hippocampus_transcript_results.Rds

- striatum_gene_results.Rds

- striatum_hippocampus_results.Rds

- striatum_hippocampus_transcripts_results.Rds

- striatum_sex_gene_results.Rds

- striatum_sex_transcript_results.Rds

- striatum_transcript_results.Rds

- gencode_annotations/: This directory contains the exact GENCODE genome and transcriptome annotations used for our analyses

- GRCm39.primary_assembly.genome.fa

- GRCm39.primary_assembly.genome.fa.fai

- gencode.vM31.primary_assembly.annotation.gtf

- gffread/: This directory contains the generated fasta files with exact isoform sequences for novel and annotated genes required for creating isoformSwitchAnalyzeR objects.

- isoform_sequences.fa

- isoform_sequences_linear.fa

- nextflow/: All files in the following directories in the overarching nextflow are direct outputs from the nf-core nanoseq pipeline. For specific information on nanoseq pipeline outputs, please refer to https://nf-co.re/nanoseq/3.1.0/docs/output

- bambu/

- counts_gene.txt

- counts_transcript.txt

- extended_annotations.gtf

- extended_annotations.gtf.idx

- versions.yml

- fastqc/

- There are 2 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1_1_fastqc.html

- sample01_R1_1_fastqc.zip

- minimap2/

- bam/ This directory has been removed to save space, please contact us for more information.

- bigBed/

- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.bigBed

- bigWig/

- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.bigWig

- genome/

- GRCm39.primary_assembly.genome.fa.mmi

- samtools_stats/

- There are 3 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.sorted.bam.flagstat

- sample01_R1.sorted.bam.idxstats

- sample01_R1.sorted.bam.stats

- multiqc/

- multiqc_data/

- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.txt

- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.txt

- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.txt

- mqc_samtools-idxstats-xy-plot_1.txt

- mqc_samtools_alignment_plot_1.txt

- multiqc.log

- multiqc_data.json

- multiqc_general_stats.txt

- multiqc_samtools_flagstat.txt

- multiqc_samtools_idxstats.txt

- multiqc_samtools_stats.txt

- multiqc_sources.txt

- multiqc_plots/

- pdf/

- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.pdf

- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.pdf

- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.pdf

- mqc_samtools-idxstats-xy-plot_1.pdf

- mqc_samtools-idxstats-xy-plot_1_pc.pdf

- mqc_samtools_alignment_plot_1.pdf

- mqc_samtools_alignment_plot_1_pc.pdf

- png/

- *The same multiqc plots as the pdf directory, but in png format*

- svg/

- *The same multiqc plots as the pdf and png directory, but in svg format*

- multiqc_report.html

- versions.yml

- nanoplot/

- fastq/

- Contains 40 directories for 40 samples, each containing 12 files obtained from running nanoplot with the nf-core nanoseq pipeline. Below is a representative example, but this repo contains 1 directory per sample:

- sample01_R1/

- Dynamic_Histogram_Read_length.html

- HistogramReadlength.png

- LengthvsQualityScatterPlot_dot.png

- LengthvsQualityScatterPlot_kde.png

- LogTransformed_HistogramReadlength.png

- NanoPlot-report.html

- NanoPlot_20230413_1600.log

- NanoPlot_20230413_2047.log

- NanoStats.txt

- Weighted_HistogramReadlength.png

- Weighted_LogTransformed_HistogramReadlength.png

- Yield_By_Length.png

- pipeline_info/

- execution_report_2023-04-13_15-46-11.html

- execution_timeline_2023-04-13_15-46-11.html

- execution_trace_2023-04-13_10-59-24.txt

- execution_trace_2023-04-13_15-46-11.txt

- pipeline_dag_2023-04-13_15-46-11.svg

- samplesheet.valid.csv

- software_versions.yml

- switchlist_fasta/: This directory contains the generated fasta files for amino acids and nucleotides for individual isoformSwitchAnalyzeR objects required for downstream analyses.

- cerebellum_AA.fasta

- cerebellum_nt.fasta

- cerebellum_sex_AA.fasta

- cerebellum_sex_nt.fasta

- cortex_AA.fasta

- cortex_nt.fasta

- cortex_sex_AA.fasta

- cortex_sex_nt.fasta

- hippocampus_AA.fasta

- hippocampus_nt.fasta

- region_region_AA.fasta

- region_region_nt.fasta

- striatum_AA.fasta

- striatum_nt.fasta

- striatum_sex_AA.fasta

- striatum_sex_nt.fasta

- switchlist_objects/: This directory contains intermediate and final isoformSwitchAnalyzeR objects. “Region_all” in the filename is a list of four switchlists that compare a single brain region (cerebellum, cortex, hippocampus, striatum) to all others in aggregate. “Region_sex” in the filename is a list of four switchlists (cerebellum, cortex, hippocampus, striatum) that compare across sexes (male and female). “Region_region” denotes a single switchlist that includes all pairwise region comparisons. “Sex” in the name without “region” is comparing all regions in aggregate.

- de_added/: This directory contains final isoformSwitchAnalyzeR objects that include open reading frame and differential expression results incorporated.

- region_all_switchlist_list_orf_de.Rds

- region_region_orf_de.Rds

- region_sex_switchlist_list_orf_de.Rds

- orf_added/: This directory contains intermediate and final isoformSwitchAnalyzeR objects with open reading frame information added.

- region_all_switchlist_list.Rds

- region_region_switchlist_analyzed.Rds

- region_sex_switchlist_list.Rds

- sex_switchlist_analyzed.Rds

- pfam_added/: This directory contains final isoformSwitchAnalyzeR objects (including de and orf information) with added protein domain information. Please note pfam does not comprehensively identify all protein domains for every gene.

- region_all_list_orf_de_pfam.Rds

- region_region_orf_de_pfam.Rds

- region_sex_list_orf_de_pfam.Rds

- raw/: This directory contains the initial isoformSwitchAnalyzeR objects, without additional information added.

- region_all_switchlist_list.Rds

- region_region_switchlist_analyzed.Rds

- region_sex_switchlist_list.Rds

- sex_switchlist.Rds

Files

Files (14.5 GB)

Name	Size	Download all
mouse_brain_iso_div_data.tar.gz md5:845556f11acdfd69df2d9bd97793549c	14.5 GB	Download

Additional details

Is supplement to: Software: https://github.com/lasseignelab/230227_EJ_MouseBrainIsoDiv/tree/main (URL); Software: https://lasseignelab.shinyapps.io/mouse_brain_iso_div/ (URL); Software: 10.5281/zenodo.10481312 (DOI)
Is supplemented by: Workflow: 10.5281/zenodo.10480924 (DOI)

National Human Genome Research Institute
Integrating multidimensional genomic data to discover clinically-relevant predictive models R00HG009678

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	150	150
Downloads	187	187
Data volume	2.7 TB	2.7 TB

Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Contributors

Contact person:

Files

Files (14.5 GB)

Additional details

Related works

Funding

Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Creators

Contributors

Contact person:

Description

Files

Files (14.5 GB)

Additional details

Related works

Funding