ENCODE ATAC-seq pipeline

Jinwook Lee; Daniel Kim; Grey Cristoforo; Chuan-Sheng Foo; Chris Probert; Nathan Beley; Anshul Kundaje

doi:10.5281/zenodo.3564813

Published December 5, 2019 | Version 1.5.4

Software Open

ENCODE ATAC-seq pipeline

1. Stanford University

This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq and DNase-seq data. The pipeline can be run on compute clusters with job submission engines as well as on stand alone machines. It inherently makes uses of parallelized/distributed computing. Pipeline installation is also easy as most dependencies are automatically installed. The pipeline can be run end-to-end, starting from raw FASTQ files all the way to peak calling and signal track generation using a single caper submit command. One can also start the pipeline from intermediate stages (for example, using alignment files as input). The pipeline supports both single-end and paired-end data as well as replicated or non-replicated datasets. The outputs produced by the pipeline include 1) formatted HTML reports that include quality control measures specifically designed for ATAC-seq and DNase-seq data, 2) analysis of reproducibility, 3) stringent and relaxed thresholding of peaks, 4) fold-enrichment and pvalue signal tracks. The pipeline also supports detailed error reporting and allows for easy resumption of interrupted runs. It has been tested on some human, mouse and yeast ATAC-seq datasets as well as on human and mouse DNase-seq datasets.

The ATAC-seq pipeline protocol specification is here. Some parts of the ATAC-seq pipeline were developed in collaboration with Jason Buenrostro, Alicia Schep and Will Greenleaf at Stanford.

Features

Portability: The pipeline run can be performed across different cloud platforms such as Google, AWS and DNAnexus, as well as on cluster engines such as SLURM, SGE and PBS.
User-friendly HTML report: In addition to the standard outputs, the pipeline generates an HTML report that consists of a tabular representation of quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/TSS enrichment). An example of the HTML report. The json file used in generating this report.
Supported genomes: Pipeline needs genome specific data such as aligner indices, chromosome sizes file and blacklist. We provide a genome database downloader/builder for hg38, hg19, mm10, mm9. You can also use this builder to build genome database from FASTA for your custom genome.

Files

atac-seq-pipeline-1.1.7.zip

Files (2.0 MB)

Name	Size	Download all
atac-seq-pipeline-1.1.7.zip md5:446666c68ce9cbcef22dc429ef8f0f8a	663.0 kB	Preview Download
atac-seq-pipeline-1.4.2.zip md5:68bf74ce461eb87d9429cb03e95cc95d	702.1 kB	Preview Download
atac-seq-pipeline-1.5.4.zip md5:f4596f0f7beb0aae178e8926bdeb6f94	645.6 kB	Preview Download

	All versions	This version
Views	1,545	1,275
Downloads	181	142
Data volume	130.6 MB	103.0 MB

ENCODE ATAC-seq pipeline

Creators

Description

Files

atac-seq-pipeline-1.1.7.zip

Files (2.0 MB)