Atlas of nascent RNA transcripts reveals enhancer to gene linkages
Authors/Creators
Description
Data associated with the paper "Atlas of nascent RNA transcripts reveals enhancer to gene linkages"
GitHub repository for the analyses: https://github.com/Dowell-Lab/DBNascent_Analysis
Below are the summaries of the files associated with this publication.
1. muMerge calls for each paper used in the merging
paper_mumerge_calls
- The calls are separated by the bidirectional caller (dreg, tfit)
- In the folders are bed files (e.g. Allen2014global_hg38_dreg_MUMERGE.bed) for each paper and species (hg38, mm10)
2. Base content for regions called by dREG and Tfit in each paper in mouse and human
mumerge_base_composition
- The base content for each paper after the first round of muMerge
- The files contain the id and the base nucleotide content in 300bp around the center region (id, cg, at)
3. Bidirectional regions called by Tfit and dREG after merging. Regions are for mouse and human datasets. (See https://github.com/Dowell-Lab/bidirectionals_merged)
bidirectional_regions
- Bidirectional regions called after muMerge and filtering
- Calls for both human and mouse datasets are reported (hg38_tfit_dreg_bidirectionals.bed.gz and mm10_tfit_dreg_bidirectionals.bed.gz)
- The bed files are in bed6 format with the following columns:
chromosome, start, stop, bidirectional, number of papers a bidirectional was called, strand (it is . since bidirectionals are not stranded)
4. Metadata for samples used in the SPECS and correlation analysis
metadata
- Sample metadata for filtered samples (human_samples_QC_GC_protocol_filtered.tsv.gz) in the downstream analyses
5. SPECS scores across genes and bidirectional regions
specs_scores
- The SPECS scores for all tissues analyzed (filt_qc123_all_specs_all.txt.gz) are reported,
- Along with the maximum (filt_qc123_all_specs_maxval.txt.gz)
- And minimum SPECS scores (filt_qc123_all_specs_minval.txt.gz).
- The SPECS scores were also split by disease vs non-disease samples
- The TPMs summaries are also included
6. Normalized counts
normalized_counts
- Gene and bidirectional region normalized counts (gene_bidir_tpm.tsv.gz)
7. Bidirectional Region and gene pairs (See https://github.com/Dowell-Lab/bidir_gene_pairs)
bidirectional_gene_pairs
- Gene and bidirectional region pairs (dbnascent_pairs.txt.gz) across tissues in high-quality samples.
- The pairs are reported in a bed12 file
- Where the first 6 columns are gene coordinates and the following 6 are bidirectional coordinates.
- The remaining columns are the summary statistics for correlation and the relationship between the gene and bidirectional.
- Additional columns note whether the pair overlaps eQTLs from GTEx (eQTL) or polII ChIA-PET loops
- transcript1_chrom : Gene chromosome
- transcript1_start : Gene start coordinate
- transcript1_stop : Gene stop coordinate
- transcript_1 : Gene id
- transcript1_score : Gene score (. since none was assigned)
- transcript1_strand : Gene strand
- transcript2_chrom : Bidirectional chromosome
- transcript2_start : Bidirectional start coordinate
- transcript2_stop : Bidirectional stol coordinate
- transcript_2 : Bidirectional id
- transcript2_score : Bidirectional score (i.e. the number of papers that support a bidirectional from muMerge)
- transcript2_strand : Bidirectional strand (. since these are not stranded)
- pcc : Pearsons correlation coefficient
- pval : P-value
- adj_p_BH : Adjusted p-value (Benjamini-Hochberg correction)
- nObs : Number of observations in correlation analysis
- t : T statistic
- distance_tss : Distance between the gene start (TSS) and the bidirectional start coordinate
- distance_tes : Distance between the gene stop (TES) and the bidirectional start coordinate
- position : Is the bidirectional upstream or downstream of the TSS
- tissue : Tissue id based on metadata for tissue-derived correlations (labeled All_samples if all samples are used)
- percent_transcribed_both : Percent of the number of observed samples used in the analysis
- pair_id : Gene:Transcript~Bidirectional pair name
- gene_id : Gene id
- chiapet : Binary indicator for whether pair overlaps overlap polII ChIA-PET
- gtex : Bindary Indicator whether a pair is overlapping GTEx pairs
Files
Files
(4.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:185efd51928a89d2e8f06b0285c0c348
|
4.6 GB | Download |