ncov-recombinant v0.5.1 - v0.6.0

Test Summary Package

This report was automatically generated on November 7, 2022.

Authors

Katherine Eaton | National Microbiology Laboratory, PHAC |

1. Summary

The ncov-recombinant update from v0.5.1 to v0.6.0 has two major changes.

The first change is a Nextclade upgrade to the sars-cov-2 2022-10-27 dataset, which introduces recombinant sublineages for the first time (ex. XBB.1) and two new lineages: XBD and XBE.

The second major change is the calculation and visualization of immune-related statistics. In v0.6.0, the number of key receptor binding domain (RBD) mutations is calculated for every sample. This is performed by comparing the Nextclade aaSubstitutions column (amino acid substitutions) produced by Nextclade, to the list of 12 key RBD mutations provided by Nextstrain. In addition, the statistics immune_escape and ace2_binding from Nextclade are included in the final linelists.

Between v0.5.1 and v0.6.0, 14.2% of sequences in the controls-gisaid dataset had different detection results. 4.0% of sequences were newly classified (NAX*) and represent lineages not present in the v0.5.1 model. 10.2% of sequences had sublineage assignment changes as a result of the Nextclade dataset upgrade. 0.0% of positive controls were dropped (X*NA), indicating no observed loss in sensitivity.

ncov-recombinant v0.6.0 is a recommended upgrade for recombinant surveillance to enable sublineage classification and to access enhanced statistics regarding immune-escape.

For a comprehensive summary of the methodological changes, please see the release notes for v0.6.0

2. Purpose

Verify that the update of ncov-recombinant pipeline from version 0.5.1 to0.6.0:

  1. Maintains specificity for recombinants trained in previous versions.
  2. Increases sensitivity for newly designated recombinant sublineages.

3. Datasets

Controls (controls-gisaid)

This dataset includes SARS-CoV-2 genomes from GISAID that reflect the known diversity of recombinant sequences to date. These include 501 positive controls (recombinants), representing lineages XA - XBE and 186 negative controls (non-recombinants) selected from the Nextstrain Reference Phylogeny.

In total, 687 control sequences were used as input and a strain list is available here.

4. Procedure

The snakemake pipelines for v0.5.1 and v0.6.0 were run independently on the same dataset (controls-gisaid). Please see the Procedure section in the Supplementary for detailed command-line instructions.

5. Results

Controls (controls-gisaid)

Note: Lineage assignments in v0.6.0 are identical to those in pango-designation and are the expected values.

Figure 1: Comparison of lineage assignments in the controls-gisaid dataset between v0.5.1 and v0.6.0.

New Detections

New detections (NAX*) result from the following changes in v0.6.0:

  1. Nextclade dataset upgrades to include newly delegated lineages XBD and XBE.

Lineage Changes

Sublineage changes result from the following updates in v0.6.0:

  1. Nextclade dataset upgrades to include XAY.1, XBB.1, XBB.1.1, XBB.2, XBC.1, and XBC.2.

Reporting Period Plots

The following plots report recombinant sequences over the last 16 weeks.

Figure 2: Reporting period timeline of recombinants in the controls-gisaid dataset in v0.5.1.
Figure 3: Reporting period timeline of recombinants in the controls-gisaid dataset in v0.6.0.
Figure 4: Reporting period breakpoint distributions by clade of the controls-gisaid dataset in v0.6.0.
Figure 5: Reporting period timeline of receptor binding domain (RBD) mutations in the controls-gisaid dataset in v0.6.0.

Supplementary

Note: Download the GISDAID sequences and metadata in the strains list to data/controls-gisaid/.

Procedure

Version 0.5.1 | 799904eb

  1. Download the pipeline.

    git clone https://github.com/ktmeaton/ncov-recombinant.git 0.5.1
    cd 0.5.1
    git checkout v0.5.1
  2. Symlink controls-gisaid, data.

    rm -rf data/controls-gisaid
    ln -s ../data/controls-gisaid data/controls-gisaid
  3. Create a version-controlled conda environment.

    # Local
    mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.5.1
    
    # HPC
    sbatch -J conda-ncov-recombinant-0.5.1 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.5.1"
  4. Run the pipeline.

    # Local
    conda activate ncov-recombinant-0.5.1
    snakemake --profile profiles/controls-gisaid-hpc
    
    # HPC
    scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.5.1

Version 0.6.0 | fae7bfdb

  1. Download the pipeline.

    git clone https://github.com/ktmeaton/ncov-recombinant.git 0.5.0
    cd 0.6.0
    git checkout v0.6.0
  2. Symlink controls-gisaid, data.

    rm -rf data/controls-gisaid
    ln -s ../data/controls-gisaid data/controls-gisaid
  3. Create a version-controlled conda environment.

    # Local
    mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.0
    
    # HPC
    sbatch -J conda-ncov-recombinant-0.5.1 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.0"
  4. Run the pipeline.

    # Local
    conda activate ncov-recombinant-0.6.0
    snakemake --profile profiles/controls-gisaid-hpc
    
    # HPC
    scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.6.0

Comparison

After the pipelines are complete for each version, run the following to compare lineage assignments.

python3 0.6.0/scripts/compare_positives.py \
  --positives-1 0.5.1/results/controls-gisaid/linelists/positives.tsv \
  --positives-2 0.6.0/results/controls-gisaid/linelists/positives.tsv \
  --ver-1 "v0.5.1" \
  --ver-2 "v0.6.0" \
  --outdir compare/controls-gisaid \
  --node-order alphabetical \
  --min-link-size 1

New Lineages

csvtk cut -t -f "strain" 0.5.1/results/controls-gisaid/linelists/positives.tsv \
  | tail -n+2 \
  | csvtk grep -t -f "strain" -P - -v 0.6.0/results/controls-gisaid/linelists/positives.tsv \
  | csvtk cut -t -f "strain" \
  | tail -n+2 \
  | csvtk grep -t -f "strain" -P - 0.5.1/results/controls-gisaid/linelists/linelist.tsv \
  | csvtk pretty -t \
  | less -S

Dropped Lineages

csvtk cut -t -f "strain" 0.6.0/results/controls-gisaid/linelists/positives.tsv \
  | tail -n+2 \
  | csvtk grep -t -f "strain" -P - -v 0.5.1/results/controls-gisaid/linelists/positives.tsv \
  | csvtk cut -t -f "strain" \
  | tail -n+2 \
  | csvtk grep -t -f "strain" -P - 0.6.0/results/controls-gisaid/linelists/linelist.tsv \
  | csvtk pretty -t \
  | less -S

Historical Plots

The following plots report all recombinant sequences.

Controls (controls-gisaid)

Figure 6: Historical timeline of recombinants in the controls-gisaid dataset in v0.5.1.
Figure 7: Historical timeline of recombinants in the controls-gisaid dataset in v0.6.0.
Figure 8: Historical timeline of receptor binding domain (RBD) mutations in the controls-gisaid dataset in v0.6.0.
Figure 9: Historical breakpoint distributions by clade of the controls-gisaid dataset in v0.6.0.