milaboratory/mixcr: MiXCR 4.1
Authors/Creators
- 1. MiLaboratories
- 2. @MiLaboratory
- 3. @Canva
- 4. Sequence Software
Description
MiXCR 4.1 features two major functional upgrades:
- essential fixes and improvements for the single-cell and molecular-barcoded data processing algorithms
- new powerful set of tools for allelic variant discovery and analysis of antibody hypermutation trees
Along with these features, release brings radically simplified user interface, which reduces all the complexities of repertoire analysis pipeline down to a single command, where only one option, the "preset", has to be specified. MiXCR 4.1 is shipped with many of specifically optimized presets, for most of the repertoire analysis cases. Upgrades, introduced in this release, also significantly increases transparency of analysis pipeline, by providing a diverse set of new graphical QC reports and adding dozens of new metrics to textual and JSON reports. Additionally, this release incorporates tens of important fixes, performance optimizations and stability improvements.
Documentation portalAlong with the software release, we present a new documentation portal. It features a clean content organization, informative illustrations, deep guides on many real-world repertoire analysis scenarios and detailed descriptions for each of the MiXCR commands and analysis presets.
Welcome to https://docs.milaboratories.com/
Improvements for single-cell and molecular-barcoded data analysisBased on our deep research of a large number of single-cell and molecular barcoded datasets, generated with dozens of protocols and instruments in a wide set of laboratory setups, we developed several important upgrades to the algorithms engaged in analysis of tagged data. With all the improvements and fixes, MiXCR 4.1 produces clean and reliable results for the majority of popular wet-lab protocols, being robust to a wide range of protocol noises, cross contamination mechanisms and artifacts. The set of tools offered by MiXCR 4.1 allows it to be applied for virtually any data of such type.
Featured fixes and upgrades:
- new high-performance aligner settings optimized for single-cell T- and B-cell receptor datasets
- important fixes for
assemblePartialalgorithm for tagged data - redesign of tag correction algorithm to increase performance and decrease memory consumption
- whitelist-based barcode correction in
refineTagsAndSortstep (f/k/acorrectAndSortTags) - comprehensive options for data filtering, applied right after barcode sequence correction (in
refineTagsAndSort) - algorithms for automated threshold selection in
refineTagsAndSortfilters - multiple improvements for consensus assembly algorithm (which pre-assembles consensuses from tagged groups in
assemble); increased performance and stability in respect to data artifacts - automated inference of minimal number of reads in consensus
- de-contamination filters in
assembleto fight cross-cell contaminations - rework of
assembleContigsalgorithm to increase robustness in respect to data artifacts - many new QC metrics from tag pattern parsing, sequence correction, consensus to contig assembly algorithms
MiXCR 4.1 introduces two new comprehensive tools for analysis of hypermutation trees of antibodies. The first is the de-novo discovery of V and J gene alleles provided by the findAlleles command. And the second is the SHM trees reconstruction tool provided by the findShmTrees command. These two features go hand in hand and help each other to accurately separate allelic variants from somatic mutations and reconstruct mutation tree topology, given the set of samples for the same individual. We implemented new original algorithms for these tasks, both are based on sophisticated analysis of alignments with germline segments, rather than naive reconstruction of mutation histories regardless of the sequence structure, as implemented in other tools. This functionality is accompanied by a set of commands to export SHM trees in several formats: exportShmTrees, exportShmTreesWithNodes, exportShmTreesNewick and exportPlots shmTrees.
- For the correct lineage tree reconstruction, it is critical to first have accurate V- and J-gene allele information for a particular donor or mouse strain. Hence, it is highly recommended to first run
findAllelesand re-align all clonotype sequences (option-o) to a newly generated individual reference V- and J-gene library. findAllelesutilizes an allele inference algorithm which can use even somatically hypermutated clonal sequences as input data.- Both
findAllelesandfindShmTreescommands support multiple.clnsfiles input - so the alleles can be inferred and lineage trees can be reconstructed using all available datasets. Note that it only makes sense to use datasets derived from an individual donor (or homogenic mouse strain) per command launch. - All commands produce extensive reports and auxiliary tables providing additional transparency in the algorithm performance
From now on, most users can run the whole pipeline, specifying just a single option, the preset name, in addition to the input and output file names.
MiXCR provides tens of fine tuned sets of parameters (presets) to extract repertoires from the data generated with most of the commercially available kits and instruments as well as with the well established open protocols, including single-cell, bulk repertoire sequencing with or without molecular-barcodes and non-enriched data like RNA-Seq.
For example you can run the whole analysis (from fastq to clonesets) for the dataset generated with MiLaboratories Human TCR RNA Multiplex kit using the following command:
mixcr analyze milab-human-tcr-rna-multiplex-cdr3 input_file_R1.fastq.gz input_file_R2.fastq.gz results_prefix
This will produce a full set of intermediate files, with tsv clonesets and extensive report files both in txt and json formats.
The preset functionality is accompanied by the set of special high level command line options, we call mixins, that help to adapt the selected preset if experimental setup requires non-standard analysis (though it is not required in most cases).
The following improvements were made to MiXCR's CLI:
analyzecommand was completely redesigned (see example above)- mixin options were introduced; can be specified on
analyze,alignor, for some mixins, on other pipeline stages - new refreshed and polished CLI help
- new safer and more reliable file name expansion mechanism,
{{a}}and{{R}}pattern elements added; now one can specify... input_file_{{R}}.fastq.gz output.vdjcainstead of... input_file_R1.fastq.gz input_file_R2.fastq.gz output.vdjca - all reports and analysis parameters are now embedded into the output files and can be easily retrieved afterwords
MiXCR 4.1 introduces a new exportQc command to visualize different quality control metrics including alignment performance, chain usage, reads coverage, barcode abundance distribution, automatically selected correction threshold etc.
- fix a bunch of visualization issues #743, #747, #748, #749, #750, #751
- added bar plot gene usage plots
- added gene family usage plots
- better naming for diversity and overlap measures
- rename
biophysicstocdr3metricsin postanalysis - support of svg / png and other graphical formats in
exportPlots - allow samples with different data types (umi/no-umi) been used in overlapScatter when implement cutting contig results by assemble region
- introduce
--pairwise-comparisonsinstead of--hide-pairwise-comparisoninexportPlots diversity / biophysics - fixed wrong sign for hydrophobicity metric in downstream analysis
- fixed incorrect behaviour of clonotype splitting by V, J and C genes
- multiple bug fixes for post analysis downsampling
- added
--show-significanceoption inexportPlots diversity / biophysics - fix NPE in overlap browser when some clone do not contain gene feature specified in overlap criteria
- splitting of clones on export; there is no need to run
exportClonescommand multiple times (only "by chain" option is currently implemented) - new export fields for single-cell and molecular barcodes (i.e.
-tagFraction) - fixes for
--not-aligned-R1/2option for tagged analysis - incomplete V gene feature correction for AIRR export, if vFeatureToAlign was adjusted to exclude primer sequence from alignment
- options to export reads that were not parsed according to the tag pattern (
--not-parsed-R1/2) - start from BAM file
- CLI and several other parts are (re)implemented in Kotlin
- temporary files are now by default are placed to the system temp folder; option to move them in the folder of output files
--use-system-temp - fixed bug in
assemblereport caused by pre-clone assembler which did not reportedfailed to extract target - fixed NPE in assembleContigs with disjoint features (#727)
- better ChainsUsage report (#732)
- factor-by option for overlap downstream analysis
- allow lowercase letters for tags in downstream analysis options
- default weight function now is required
- fixed bug leading to p-values being not showing for secondary grouping options
- several bug fixes with tabular export when some samples are empty
- overlap performance significantly improved
- speedup for downstream analysis routines when all samples has only one chain
- fixes NPE in assemble for libraries completely lacking records for one of the gene types
--summaryoption added to export downsampling statistics in a tabular form formixcr downsample- fixed bug in downsampling based on UMI/Cell counts which lead to a totally wrong results
- fixed bug in post analysis leading to incorrect numbers in preprocessor statistics for
sumWeightBefore/After - new export field
-uniqueTagFraction pipelineInfocommand removed- increased stability and speed of license verification
--paletteoption added for plots (gene usage, overlap)- hyphens removed from the
-positionOfexport column (#699) - fixes bug with
--only-productiveinexportClonesOverlap - more flexibility for data grouping in diversity & biophysics plots (#700)
- fix for tabular output overwriting options
- fix for exception when metadata has more samples and sample grouping is applied
We also added many new unit and integration tests to keep the highest possible quality of the software at the same time allowing for the shorter release cycles.
Incompatible modifications & migration instructions- binary file format for all intermediate files has changed, no backward compatibility was preserved
- subcommand
correctAndSortTagswas renamed torefineTagsAndSort - export columns
-countand-fractionwere renamed to-readCountand-readFractioncorrespondingly; column names in the tsv file has also changed analyzecommand was completely redesigned, and has no backward compatibility with previous parameters, however the same functionality can be achieved withgeneric-t(b)cr-ampliconandrnaseq-tcr-cdr3and similar presets (see the docs)- mechanism helping to avoid re-running of some MiXCR commands when re-analyzing the same dataset with the same options was dropped; we plan to reintroduce the feature in some of the future releases
Files
milaboratory/mixcr-4.1.0.zip
Files
(35.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0b4019654301c03105c06cdc3bda89e5
|
35.2 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/milaboratory/mixcr/tree/4.1.0 (URL)