Published March 26, 2022 | Version 1.0
Dataset Open

Gene expression count data from human post-mortem spinal cord

Creators

  • 1. Mount Sinai

Contributors

Contact person:

  • 1. Mount Sinai

Description

Gene expression data from human post-mortem tissue for three spinal cord sections (cervical, thoracic and lumbar) from amyotrophic lateral sclerosis (ALS) patients and non-neurological disease controls. RNA sequencing performed as part of the New York Genome Center ALS Consortium.

Analysis workbooks: https://jackhump.github.io/ALS_SpinalCord_QTLs/ 

Preprint describing results: https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1 

Sample sizes:

Region

Control

ALS

Cervical

35

139

Thoracic 

10

42

Lumbar

32

122

 

 

 

 

 

 

 

Library preparation

RNA was extracted from flash-frozen postmortem tissue using TRIzol (Thermo Fisher Scientific) chloroform, followed by column purification (RNeasy Minikit, QIAGEN). RNA integrity number (RIN) was assessed on a Bioanalyzer (Agilent Technologies). RNA-Seq libraries were prepared from 500ng total RNA using the KAPA Stranded RNA-Seq Kit with RiboErase (KAPA Biosystems) for rRNA depletion and Illumina-compatible indexes (NEXTflex RNA-Seq Barcodes, NOVA-512915, PerkinElmer, and IDT for Illumina TruSeq UD Indexes, 20022370). Pooled libraries (average insert size: 375 bp) passing the quality criteria were sequenced either on an Illumina HiSeq 2500 (125 bp paired end) or an Illumina NovaSeq (100 bp paired-end). The samples had a median sequencing depth of 42 million read pairs, with a range between 16 and 167 million read pairs.

Data processing

Samples were uniformly processed using RAPiD-nf, an efficient RNA-Seq processing pipeline implemented in the NextFlow framework. Following adapter trimming with Trimmomatic (version 0.36), all samples were aligned to the hg38 build (GRCh38.primary_assembly) of the human reference genome using STAR (2.7.2a), with indexes created from GENCODE, version 30. Gene expression was quantified using RSEM (1.3.1) using GENCODE v30. Quality control was performed using SAMtools and Picard, and the results were collated using MultiQC. Various technical metrics for sequencing quality control are provided in the metadata. Estimated read counts and normalised transcripts per million (TPM) matrices provided for each tissue.

Provided data:

gencode.v30.gene_meta.tsv.gz - tab separated table with columns "genename", the HGNC gene symbol, and "geneid" the Ensembl ID, as set in the GENCODE v30 comprehensive annotation.

For {tissue} in Cervical_Spinal_Cord, Thoracic_Spinal_Cord, Lumbar_Spinal_Cord:

{tissue}_metadata.tsv.gz - metadata describing each sample. Each row describes a sample. Descriptions of each column below.

{tissue}_gene_tpm.tsv.gz - the normalised TPM values from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

{tissue}_gene_counts.tsv.gz - the estimated read counts from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.

Metadata Column Description

rna_id  - de-identified sample ID for each unique RNA-seq sample

dna_id - de-identified donor ID for each patient enrolled in the study

site_id - de-identified site name for each contributing site

tissue - name of tissue/region

age_rounded - age at death, rounded to nearest decade

sex - biological sex of donor

subject_group - long form disease group

disease - short form disease group

site_of_motor_onset - for ALS donors, where did symptoms start?

disease_duration - for ALS donors, how long did donor live with disease? 

mutations - any known ALS gene mutations

library_prep - type of library preparation method used

seq_platform - sequencing platform used for sequencing

rin - RNA integrity number, 0-10

c9orf72_repeat_size - estimated C9orf72 repeat expansion size

gPC1 - gPC5 - principal component of genetic ancestry from whole genome sequencing

Remaining metadata columns are from Picard - see here: http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics 

 

Files

Files (37.5 MB)

Name Size Download all
md5:ef2509aa14b14a539f30a93e31a8626c
16.5 MB Download
md5:d9f5386bb6a20a8761b06619feb20cab
452.5 kB Download
md5:4eb7b9be356230c36e31ca987e10f912
14.8 MB Download
md5:8fa6a81bad47eff1de9d411be0c563ca
5.6 MB Download

Additional details

Related works

Cites
Preprint: 10.1101/2021.08.31.21262682 (DOI)