Published March 14, 2024 | Version v3.0.0
Software Open

broadinstitute/Drop-seq: Java 21, R packages for Dropulation and Census-seq

  • 1. Broad Institute
  • 2. Columbia University
  • 3. UCSF

Description

New and changed

Java 21 and Gradle

This is the first release built with Java 21. You will need to use a JVM version >= 21.

R packages for Dropulation and Census-seq

Our Java programs have long been available for donor assignment and Census-Seq, but we've typically analyzed their outputs using R for standard QC plots and analysis. This release marks the first time we're making these tools public, offering the community a standard set of downstream analyses for this data. This ensures proper analysis of outputs and provides users with plots to help diagnose technical or biological issues with donor pools.

We will shortly release a new "cookbook" detailing how to run donor assignment on both scRNASeq and scATACSeq data, with more in depth explanations of the program outputs and explanations of how to interpret each QC plot. Please look for that on this github's front page next to the other cookbooks.

New program FilterReadsByUMISupport

For some analysis, it may be useful to only evaluate UMIs that have at least some minimum or maximum number of reads supporting them. This tool counts the number of UMIs supporting each read, and emits a BAM file that only contains reads within the bounds of MIN_READ_SUPPORT and MAX_READ_SUPPORT.

GatherDigitalAlleleCounts changes

Tracking of alternate alleles improved. Multiallelic sites now report the alternate allele with the highest allele frequency instead of randomly selecting one of the alternate alleles. When POLYMORPHIC_SNPS_ONLY=false, in rare cases GDAC lost track of the alternate allele and emitted an N instead with 0 counts. This was fixed to emit the proper alternate allele and counts data.

DigitalExpression changes - smells like STARSolo

We've conducted a recent assessment of how Optimus/STARSolo expression measurements stack up against DropSeq's techniques. STARSolo outperformed DropSeq in retrieving UMIs from reads that align to multiple genomic locations but only support a single gene. To replicate this capability, simply set READ_MQ=0, allowing for the recovery of these UMIs in a similar fashion.

STARSolo uses different functional annotation strategy - that is, the way annotations of which regions of the genome are exonic, intronic, and antisense are interpreted and prioritized. The STARSOLO strategy priority is very similar to DropSeq, except in cases where a read overlaps both an intron on the sense strand and a coding region on the antisense strand. In these cases, DropSeq favors the intronic interpretation, while STARSolo interprets this as a technical artifact and labels the read as coming from the antisense coding gene, and the read does not contribute to the expression counts matrix. This functional strategy is enabled in STARsolo using the flag --soloFeatures GeneFull_Ex50pAS. This can be summarized by this schematic: <img width="566" alt="image" src="https://github.com/broadinstitute/Drop-seq/assets/4561831/39d8b63e-e932-48fc-ba18-87d8320304e0">

We have added a new parameter FUNCTIONAL_STRATEGY to all programs that interpret gene annotations. This can be set to interpret the annotations with method's priority. When READ_MQ=0 and FUNCTIONAL_STRATEGY=STARSOLO, the methods generate very similar results. The plot below is the summed expression of each gene across cells for a single experiment, where the expression was generated by STARSolo and DropSeq on the same set of sequencing reads aligned by STARSolo.

<img width="480" alt="image" src="https://github.com/broadinstitute/Drop-seq/assets/4561831/3bed4aed-cc76-4fa9-a370-02b4bcd1c786">

Cell barcode correction tools

These tools emulate Cell Ranger's cell barcode correction algorithm.

  1. CountBarcodeSequences tallies the appearance of cell barcodes in the input paired-end BAM. Recommended usages is to pass the list of expected barcodes via ALLOWED_BARCODES option, so that only these cell barcodes are counted.
  2. CorrectAndSplitScrnaReadPairs corrects cell barcodes that are edit-distance 1 away from one of the barcodes in the ALLOWED_BARCODE_COUNTS file (produced by CountBarcodeSequences). The program can also split the read pairs into multiple BAM files, in which all the reads for a cell barcode are in a single BAM. This can facilitate parallel processing of the reads.

Installation

Java

Download and unzip dropseq-3.0.0.zip . Wrapper scripts for all the command-line programs will be in the dropseq-3.0.0 directory that is created when unzipping.

R packages

Do this in the order below, because DropSeq.dropulation depends on DropSeq.utilities, and if you don't install that package first, the source version of DropSeq.utilities will be installed, which may be unstable.

install.packages("https://github.com/broadinstitute/Drop-seq/releases/download/v3.0.3/DropSeq.utilities_3.0.0.tar.gz") 
install.packages("https://github.com/broadinstitute/Drop-seq/releases/download/v3.0.3/DropSeq.dropulation_3.0.0.tar.gz") 

Files

broadinstitute/Drop-seq-v3.0.0.zip

Files (211.5 MB)

Name Size Download all
md5:1184b6fb60874d26a8e9e7a4fe040564
211.5 MB Preview Download

Additional details

Related works