Gene expression count data from human post-mortem spinal cord
Description
Gene expression data from human post-mortem tissue for three spinal cord sections (cervical, thoracic and lumbar) from amyotrophic lateral sclerosis (ALS) patients and non-neurological disease controls. RNA sequencing performed as part of the New York Genome Center ALS Consortium.
Analysis workbooks: https://jackhump.github.io/ALS_SpinalCord_QTLs/
Preprint describing results: https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1
Sample sizes:
Region |
Control |
ALS |
Cervical |
35 |
139 |
Thoracic |
10 |
42 |
Lumbar |
32 |
122 |
Library preparation
RNA was extracted from flash-frozen postmortem tissue using TRIzol (Thermo Fisher Scientific) chloroform, followed by column purification (RNeasy Minikit, QIAGEN). RNA integrity number (RIN) was assessed on a Bioanalyzer (Agilent Technologies). RNA-Seq libraries were prepared from 500ng total RNA using the KAPA Stranded RNA-Seq Kit with RiboErase (KAPA Biosystems) for rRNA depletion and Illumina-compatible indexes (NEXTflex RNA-Seq Barcodes, NOVA-512915, PerkinElmer, and IDT for Illumina TruSeq UD Indexes, 20022370). Pooled libraries (average insert size: 375 bp) passing the quality criteria were sequenced either on an Illumina HiSeq 2500 (125 bp paired end) or an Illumina NovaSeq (100 bp paired-end). The samples had a median sequencing depth of 42 million read pairs, with a range between 16 and 167 million read pairs.
Data processing
Samples were uniformly processed using RAPiD-nf, an efficient RNA-Seq processing pipeline implemented in the NextFlow framework. Following adapter trimming with Trimmomatic (version 0.36), all samples were aligned to the hg38 build (GRCh38.primary_assembly) of the human reference genome using STAR (2.7.2a), with indexes created from GENCODE, version 30. Gene expression was quantified using RSEM (1.3.1) using GENCODE v30. Quality control was performed using SAMtools and Picard, and the results were collated using MultiQC. Various technical metrics for sequencing quality control are provided in the metadata. Estimated read counts and normalised transcripts per million (TPM) matrices provided for each tissue.
Provided data:
gencode.v30.gene_meta.tsv.gz - tab separated table with columns "genename", the HGNC gene symbol, and "geneid" the Ensembl ID, as set in the GENCODE v30 comprehensive annotation.
For {tissue} in Cervical_Spinal_Cord, Thoracic_Spinal_Cord, Lumbar_Spinal_Cord:
{tissue}_metadata.tsv.gz - metadata describing each sample. Each row describes a sample. Descriptions of each column below.
{tissue}_gene_tpm.tsv.gz - the normalised TPM values from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.
{tissue}_gene_counts.tsv.gz - the estimated read counts from RSEM for all 58,884 genes in GENCODE v30. Each row describes a gene and each column describes a sample.
Metadata Column Description
rna_id - de-identified sample ID for each unique RNA-seq sample
dna_id - de-identified donor ID for each patient enrolled in the study
site_id - de-identified site name for each contributing site
tissue - name of tissue/region
age_rounded - age at death, rounded to nearest decade
sex - biological sex of donor
subject_group - long form disease group
disease - short form disease group
site_of_motor_onset - for ALS donors, where did symptoms start?
disease_duration - for ALS donors, how long did donor live with disease?
mutations - any known ALS gene mutations
library_prep - type of library preparation method used
seq_platform - sequencing platform used for sequencing
rin - RNA integrity number, 0-10
c9orf72_repeat_size - estimated C9orf72 repeat expansion size
gPC1 - gPC5 - principal component of genetic ancestry from whole genome sequencing
Remaining metadata columns are from Picard - see here: http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics
Files
Files
(37.5 MB)
Name | Size | Download all |
---|---|---|
md5:ef2509aa14b14a539f30a93e31a8626c
|
16.5 MB | Download |
md5:d9f5386bb6a20a8761b06619feb20cab
|
452.5 kB | Download |
md5:4eb7b9be356230c36e31ca987e10f912
|
14.8 MB | Download |
md5:8fa6a81bad47eff1de9d411be0c563ca
|
5.6 MB | Download |
Additional details
Related works
- Cites
- Preprint: 10.1101/2021.08.31.21262682 (DOI)