Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published March 26, 2020 | Version HT-PAMDA v.1.0
Software Open

Scripts for analyzing High-Throughput PAM Determination Assay (HT-PAMDA) experimental data for CRISPR enzymes

  • 1. Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114; Department of Pathology, Massachusetts General Hospital, Boston, MA, 02114
  • 2. Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114; Department of Pathology, Massachusetts General Hospital, Boston, MA, 02114; Department of Pathology, Harvard Medical School, Boston, MA, 02115

Description

The high-throughput PAM determination assay (HT-PAMDA) is used to comprehensively profile the protospacer-adjacent motif (PAM) preferences of a large number of CRISPR-Cas variants. The uploaded Python 2 scripts and documents will enable users to analyze HT-PAMDA data that has been generated using the HT-PAMDA method as described in Walton et al. (Science, 2020).

Briefly, the HT-PAMDA analysis pipeline is comprised of four scripts, described below. At the top of each file, input the appropriate input file and sample names. A comma separated values file is also required with the information shown in the example .csv file provided (expRW086_pools_1-3_barcodes.csv). Barcodes for all samples from Walton et al. (Science, 2020) are available (Table S7 - PAMDA data summary_final.xlsx) and can be used to analyze HT-PAMDA data uploaded to the NCBI sequence read archive (SRA) under BioProject ID: PRJNA605711.

The four HT-PAMDA Python 2 scripts to be run in order are:

HT_PAMDA_1_fastqs2counts.py – inputs fastqs and csv indicating sample barcodes as input, outputs raw read counts for each protein, spacer, PAM, timepoint

HT_PAMDA_2_rawcounts2normcounts.py – inputs raw read counts, outputs normalized read counts based on, read depth/unmodified library composition, adjusted for the increased fractional representation of uncleaved substrates as other substrates are depleted

HT_PAMDA_3_normcounts2rates.py – inputs normalized counts and outputs PAM depletion rates for each protein, spacer, PAM

HT_PAMDA_4_rates2heatmaps.py – inputs PAM depletion rates and sample barcode csv, outputs heatmap representations of PAM preference for each protein

Files

expRW086_pools_1-3_barcodes.csv

Files (9.1 MB)

Name Size Download all
md5:c5279bf164bc84103d6e6cf9d5d5d1da
154 Bytes Preview Download
md5:f440669d9fedce2ef70972d01bbd47e8
1.7 kB Preview Download
md5:6ba0be9a413e2bb2e909b1caf7eeca63
5.1 kB Download
md5:750857bbc73aa3ef125b07b260057456
3.4 kB Download
md5:0b2c92af61dbe93a6215377bd29405cc
3.1 kB Download
md5:97c638698f8a27eed808f773ce930498
6.5 kB Download
md5:1bf46de577b9c44e7af10597a110b2bf
9.1 MB Download