Published December 14, 2023 | Version v1
Dataset Open

Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Contributors

  • 1. ROR icon University of Alabama at Birmingham

Description

data_minus_bam.tar.gz contains all files from the data directory (except for bam outputs) associated with the 230227_EJ_MouseBrainIsoDiv GitHub project and includes the following:

- comparison_gene_lists/: The RData in the following directory contains all comparison gene lists with DGE, DTE, and DTU for importing into the R environment and reproducing analyses.

 - all_comparison_gene_lists.Rdata

- cpm_out/: The RData in the following directory contains the processed counts per million and formatted metadata for downstream analyses.

 - cpm_counts_metadata.RData

- deseq2_data/: All files in the following directory are Rds files with deseq2 results for the study design indicated in the file name. If the file name includes “gene” it was done at the gene level and “transcript” indicates the analysis was done at the transcript level. If a filename includes two regions, it is a comparison between the two, a file name with one region denotes either “one vs all” or “male vs female”. Any filename that includes “sex” is male vs female in the indicated region(s).

  - all_regions_sex_gene_results.Rds

  - all_regions_sex_transcript_results.Rds

  - cerebellum_cortex_results.Rds

  - cerebellum_cortex_transcripts_results.Rds

  - cerebellum_gene_results.Rds

  - cerebellum_hippocampus_results.Rds

  - cerebellum_hippocampus_transcripts_results.Rds

  - cerebellum_sex_gene_results.Rds

  - cerebellum_sex_transcript_results.Rds

  - cerebellum_striatum_results.Rds

  - cerebellum_striatum_transcripts_results.Rds

  - cerebellum_transcript_results.Rds

  - cortex_gene_results.Rds

  - cortex_hippocampus_results.Rds

  - cortex_hippocampus_transcripts_results.Rds

  - cortex_sex_gene_results.Rds

  - cortex_sex_transcript_results.Rds

  - cortex_striatum_results.Rds

  - cortex_striatum_transcripts_results.Rds

  - cortex_transcript_results.Rds

  - hippocampus_gene_results.Rds

  - hippocampus_sex_gene_results.Rds

  - hippocampus_sex_transcript_results.Rds

  - hippocampus_striatum_transcripts_results.Rds

  - hippocampus_transcript_results.Rds

  - striatum_gene_results.Rds

  - striatum_hippocampus_results.Rds

  - striatum_hippocampus_transcripts_results.Rds

  - striatum_sex_gene_results.Rds

  - striatum_sex_transcript_results.Rds

 - striatum_transcript_results.Rds

- gencode_annotations/: This directory contains the exact GENCODE genome and transcriptome annotations used for our analyses

  - GRCm39.primary_assembly.genome.fa

  - GRCm39.primary_assembly.genome.fa.fai

 - gencode.vM31.primary_assembly.annotation.gtf

- gffread/: This directory contains the generated fasta files with exact isoform sequences for novel and annotated genes required for creating isoformSwitchAnalyzeR objects.

  - isoform_sequences.fa

 - isoform_sequences_linear.fa

- nextflow/: All files in the following directories in the overarching nextflow are direct outputs from the nf-core nanoseq pipeline. For specific information on nanoseq pipeline outputs, please refer to https://nf-co.re/nanoseq/3.1.0/docs/output 

  - bambu/

    - counts_gene.txt

    - counts_transcript.txt

    - extended_annotations.gtf

    - extended_annotations.gtf.idx

    - versions.yml

  - fastqc/

    - There are 2 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

      - sample01_R1_1_fastqc.html

      - sample01_R1_1_fastqc.zip

  - minimap2/

    - bam/ This directory has been removed to save space, please contact us for more information.

    - bigBed/

      - There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

        - sample01_R1.bigBed

    - bigWig/

      - There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

        - sample01_R1.bigWig

    - genome/

      - GRCm39.primary_assembly.genome.fa.mmi

    - samtools_stats/

      - There are 3 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

        - sample01_R1.sorted.bam.flagstat

        - sample01_R1.sorted.bam.idxstats

        - sample01_R1.sorted.bam.stats

  - multiqc/

    - multiqc_data/

      - mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.txt

      - mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.txt

      - mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.txt

      - mqc_samtools-idxstats-xy-plot_1.txt

      - mqc_samtools_alignment_plot_1.txt

      - multiqc.log

      - multiqc_data.json

      - multiqc_general_stats.txt

      - multiqc_samtools_flagstat.txt

      - multiqc_samtools_idxstats.txt

      - multiqc_samtools_stats.txt

      - multiqc_sources.txt

    - multiqc_plots/

      - pdf/

        - mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.pdf

        - mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.pdf

        - mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.pdf

        - mqc_samtools-idxstats-xy-plot_1.pdf

        - mqc_samtools-idxstats-xy-plot_1_pc.pdf

        - mqc_samtools_alignment_plot_1.pdf

        - mqc_samtools_alignment_plot_1_pc.pdf

      - png/

        - *The same multiqc plots as the pdf directory, but in png format*

      - svg/

        - *The same multiqc plots as the pdf and png directory, but in svg format*

    - multiqc_report.html

    - versions.yml

  - nanoplot/

    - fastq/

      - Contains 40 directories for 40 samples, each containing 12 files obtained from running nanoplot with the nf-core nanoseq pipeline. Below is a representative example, but this repo contains 1 directory per sample:

        - sample01_R1/

          - Dynamic_Histogram_Read_length.html

          - HistogramReadlength.png

          - LengthvsQualityScatterPlot_dot.png

          - LengthvsQualityScatterPlot_kde.png

          - LogTransformed_HistogramReadlength.png

          - NanoPlot-report.html

          - NanoPlot_20230413_1600.log

          - NanoPlot_20230413_2047.log

          - NanoStats.txt

          - Weighted_HistogramReadlength.png

          - Weighted_LogTransformed_HistogramReadlength.png

          - Yield_By_Length.png

  - pipeline_info/

    - execution_report_2023-04-13_15-46-11.html

    - execution_timeline_2023-04-13_15-46-11.html

    - execution_trace_2023-04-13_10-59-24.txt

    - execution_trace_2023-04-13_15-46-11.txt

    - pipeline_dag_2023-04-13_15-46-11.svg

    - samplesheet.valid.csv

   - software_versions.yml

- switchlist_fasta/: This directory contains the generated fasta files for amino acids and nucleotides for individual isoformSwitchAnalyzeR objects required for downstream analyses.

  - cerebellum_AA.fasta

  - cerebellum_nt.fasta

  - cerebellum_sex_AA.fasta

  - cerebellum_sex_nt.fasta

  - cortex_AA.fasta

  - cortex_nt.fasta

  - cortex_sex_AA.fasta

  - cortex_sex_nt.fasta

  - hippocampus_AA.fasta

  - hippocampus_nt.fasta

  - region_region_AA.fasta

  - region_region_nt.fasta

  - striatum_AA.fasta

  - striatum_nt.fasta

  - striatum_sex_AA.fasta

 - striatum_sex_nt.fasta

- switchlist_objects/: This directory contains intermediate and final isoformSwitchAnalyzeR objects. “Region_all” in the filename is a list of four switchlists that compare a single brain region (cerebellum, cortex, hippocampus, striatum) to all others in aggregate. “Region_sex” in the filename is a list of four switchlists (cerebellum, cortex, hippocampus, striatum) that compare across sexes (male and female). “Region_region” denotes a single switchlist that includes all pairwise region comparisons. “Sex” in the name without “region” is comparing all regions in aggregate.

  - de_added/: This directory contains final isoformSwitchAnalyzeR objects that include open reading frame and differential expression results incorporated.

    - region_all_switchlist_list_orf_de.Rds

    - region_region_orf_de.Rds

    - region_sex_switchlist_list_orf_de.Rds

  - orf_added/: This directory contains intermediate and final isoformSwitchAnalyzeR objects with open reading frame information added.

    - region_all_switchlist_list.Rds

    - region_region_switchlist_analyzed.Rds

    - region_sex_switchlist_list.Rds

    - sex_switchlist_analyzed.Rds

  - pfam_added/: This directory contains final isoformSwitchAnalyzeR objects (including de and orf information) with added protein domain information. Please note pfam does not comprehensively identify all protein domains for every gene.

    - region_all_list_orf_de_pfam.Rds

    - region_region_orf_de_pfam.Rds

    - region_sex_list_orf_de_pfam.Rds

  - raw/: This directory contains the initial isoformSwitchAnalyzeR objects, without additional information added.

    - region_all_switchlist_list.Rds

    - region_region_switchlist_analyzed.Rds

    - region_sex_switchlist_list.Rds

    - sex_switchlist.Rds

Files

Files (14.5 GB)

Name Size Download all
md5:845556f11acdfd69df2d9bd97793549c
14.5 GB Download

Additional details

Funding

Integrating multidimensional genomic data to discover clinically-relevant predictive models R00HG009678
National Human Genome Research Institute