This report was automatically generated on October 3, 2022.
Katherine Eaton
| National Microbiology Laboratory, PHAC
| katherine.eaton@phac-aspc.gc.ca
The ncov-recombinant update from v0.4.2
to v0.5.0
has two major changes. The first is increased flexibility in creating and defining sc2rf
modes, which allows sc2rf
to run with different parameter sets for breakpoint detection. The second change is a Nextclade
upgrade to the sars-cov-2 2022-09-27 dataset, along with validation of all designated recombinants in this dataset (XA to XBC).
Between v0.4.2
and v0.5.0
, 47.5% of sequences in the controls-gisaid
dataset had different detection results. 16.2% of sequences were newly classified (NA
→ X*
) and represent lineages not present in the v0.4.2
model. 31.3% of sequences had lineage assignment changes as a result of the Nextclade
dataset upgrade and manual curation of previously published breakpoints. 0% of positive controls were dropped between (X*
→ NA
), indicating no observed loss in sensitivity.
ncov-recombinant
v0.5.0
is a strongly recommended upgrade for monitoring existing recombinants and performing routine surveillance for emerging lineages, given the high proportion of sequences (47.5%) with lineage assignment changes.
For a comprehensive summary of the methodological changes, please see the release notes for v0.5.0
Verify that the update of ncov-recombinant pipeline from version 0.4.2
to0.5.0
:
controls-gisaid
)This dataset includes SARS-CoV-2 genomes from GISAID that reflect the known diversity of recombinant sequences to date. These include 431
positive controls (recombinants), representing lineages XA - XBC and 186
negative controls (non-recombinants) selected from the Nextstrain Reference Phylogeny.
In total, 617
control sequences were used as input and a strain list is available here.
The snakemake pipelines for v0.4.2
and v0.5.0
were run independently on the same dataset (controls-gisaid
). Please see the Procedure section in the Supplementary for detailed command-line instructions.
controls-gisaid
)NA
).Note: Lineage assignments in
v0.5.0
are identical to those in pango-designation and are the expected values.
New detections (NA
→ X*
) result from the following changes in v0.5.0
:
Lineage changes result from the following updates in v0.5.0
:
Curation of published breakpoints.
Nextclade dataset updates.
* Why were sequences of XAL assigned to XM rather than XM-like in v0.4.2
?
XAL
is almost identical toXM
, with the same hotspot breakpoint (17411:19954
), and the same high-confidence parental lineages (BA.1.1*
,BA.2*
; confidence:0.994
,0.996
). BeforeXAL
was designated, ncov-recombinant had no way to detect that sequences ofXAL
belonged to a distinct cluster fromXM
. Furthermore,XAL
only differs fromXM
by two mutations (A2865G
,G21586T
) which is insufficient evidence for ncov-recombinant to call thisXM-like
(by default, requires a minimum of three mutations). Finally, it is unclear whether XAL emerged from a unique recombination event, or is a sublineage within XM. For more information, please see pango-designation issue XAL #757.
† Why were sequences of XAR assigned to XN rather than XN-like in v0.4.2
?
XAR and XN are handed as special cases by ncov-recombinant, because their breakpoints lie at the extreme 5’ end of the genome (
2834:4183
) with few diagnostic alleles from a second parent (BA.1). Breakpoint and parents often cannot be detected by sc2rf and therefore before XAR was designated, ncov-recombinant could not differentiate them. For more information, please see ncov-recombinant issues XN #137, XAR #106, #74, and #90.
‡ Why were sequences of XAP assigned to XZ rather than XZ-like in v0.4.2
?
XAP
is almost identical toXZ
, with the same hotspot breakpoint (26061:26529
), and the same parental lineages (BA.2*
,BA.1.1*
; confidence:0.999
,0.544
). See the above discussion on XAL* for more information.
Note: Download the GISDAID sequences and metadata in the strains list.
Note: A commit hash (
37f40480
) is used instead of the tag (v0.4.2
), for an important bugfix that was introduced betweenv0.4.2
andv0.4.3
.
Download the pipeline.
git clone --recursive https://github.com/ktmeaton/ncov-recombinant.git 0.4.2
cd 0.4.2
git checkout 37f40480
Version control submodules.
cd sc2rf
git checkout 2852f05a
cd ..
Create a version-controlled conda environment.
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.4.2
Create profile for controls-gisaid
.
scripts/create_profile.sh --data data/controls-gisaid --hpc
Manually change MIN_LINEAGE_SIZE
in scripts/linelist.py
to 5.
v0.5.0
.Run the pipeline.
scripts/slurm.sh --conda-env ncov-recombinant-0.4.2 --profile my_profiles/controls-gisaid-hpc
Download the pipeline.
git clone https://github.com/ktmeaton/ncov-recombinant.git 0.5.0
cd 0.5.0
git checkout v0.5.0
Create a version-controlled conda environment.
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.5.0
Run the pipeline.
scripts/slurm.sh --conda-env ncov-recombinant-0.5.0 --profile my_profiles/controls-gisaid-hpc
After the pipelines are complete for each version, run the following to compare lineage assignments.
python3 0.5.0/scripts/compare_positives.py \
--positives-1 0.4.2/results/controls-gisaid/linelists/positives.tsv \
--positives-2 0.5.0/results/controls-gisaid/linelists/positives.tsv \
--ver-1 "v0.4.2" \
--ver-2 "v0.5.0" \
--outdir compare/controls-gisaid \
--node-order alphabetical \
--min-link-size 1
csvtk cut -t -f "strain" 0.4.2/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v 0.5.0/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - 0.4.2/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -S
csvtk cut -t -f "strain" 0.5.0/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v 0.4.2/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - 0.5.0/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -S