There is a newer version of the record available.

Published May 9, 2023 | Version 2.14.0
Software Open

nextstrain/nextclade: 2.14.0

  • 1. Biozentrum, University of Basel
  • 2. @neherlab @nextstrain
  • 3. @FredHutch / @blab / @nextstrain
  • 4. @nextstrain
  • 5. @Snyk
  • 6. University of North Carolina
  • 7. University of Freiburg
  • 8. ETH Zurich @cevo-public
  • 9. Oklahoma Medical Research Foundation
  • 10. Faculty of Computer Science, Dalhousie University
  • 11. The Francis Crick Institute

Description

Nextclade Web 2.14.0, Nextclade CLI 2.14.0 (2023-05-09) Algorithm & Datasets: enable masked sites for distance calculation

For some viruses, genome sequencing is unreliable in specific parts of the genome or some regions should be ignored for other reasons when calculating distances between nodes for the purpose of placing query sequences on the reference tree. These distances are used to find the optimal (smallest distance) placement of the query sequence on the reference tree and sequence errors in these regions can lead to wrong placement.

Until now, to place query sequences on the reference tree, Nextclade counted all nucleotide differences between query and reference sequence. Moving forward, sequence regions to be ignored for reference tree placement can be defined in datasets' virus_properties.json. This is useful for example for SARS-CoV-2, where we will start ignoring the terminal parts of the untranslated regions. Another use case is mpox, where the terminal repeats are intrinsically constrained to be identical. Masking one of the two terminals will avoid double-counting of the same mutations.

PR #1128 adds this feature to Nextclade's algorithm.

Masked ranges are specified in the new field placementMaskRanges in datasets' virus_properties.json. For example, the terminal 50 nucleotides of SARS-CoV-2 can be ignored for tree placement by adding the following line (positions are 0-based and end-exclusive):

"placementMaskRanges":[{"begin":0,"end":50},{"begin":29850,"end":29902}],

The changes are backwards compatible, if the field does not exist, Nextclade defaults to the old behavior of counting all nucleotide differences.

We are planning to shortly release a new version of SARS-CoV-2 datasets making use of this feature. Only a small proportion of sequences (<1%)should be affected, however where there are changes they will be a slight improvement in accuracy.

Avoid stale software and dataset versions in Nextclade Web

It was widely reported that users with long-persisting browser tabs and also users who don't switch datasets often, sometimes do not receive new Nextclade dataset updates, which meant that these users would not get newly designated lineages and clades lineage assignments.

Nextclade Web is a fully client-side, single-page application, which downloads the code and list of datasets once when first opening a tab. When users do not refresh the tab and don't change dataset, the same software and dataset version are used indefinitely. Without periodic page refresh and without periodic fetching of new dataset versions, users can run old code and use old data indefinitely, receiving obsolete or incomplete results.

In order to mitigate this problem, in this version, we add periodic background version checks in Nextclade Web. Every day or so, Nextclade Web will check whether the currently used version of software is the latest, as well as periodically refresh the list of available datasets and their versions. Whenever a new version of software or of a dataset is available, user will receive an update notification. The update can be accepted or dismissed (until the next version is available). Additionally, one can always obtain the latest code and datasets by doing a simple page reload in the browser (no need to clear the cache).

Nextclade is a fast-moving project, where new features and bug fixes are added frequently. We emphasize importance of using the latest versions of both, software and datasets, to receive the most accurate and up-to-date results.

Sort empty values in the results table in Nextclade Web

Nextclade Web previously had a bug, sorting incorrectly when the the column to be sorted by contained empty values. Empty values are now treated as empty strings, fixing this issue.

Improved citation dialog, website copy and translation in Nextclade Web

The "Citation" modal is now more readable and translated to multiple languages. We also added missing translations for some of the sentences in Nextclade Web. We made the intro text on main page of Nextclade Web more relevant.

Internal changes
  • Prevent duplicated GitHub action runs in pull requests
  • Remove Red Hat 7 from tested Linux distros
  • Fix Debian repositories in CI builds for aarch64-unknown-linux-gnu architecture
  • Update master branch of the fork before making bioconda PR branch
  • Extend dev documentation
<details> <summary><h3>Commit history</h3> (click to expand)</summary> </details> Instructions

📥 Nextclade CLI & Nextalign CLI can be downloaded from the links in the "Assets" section just below. Click "Show all" at the bottom of the "Assets" section to show more download options. Note the difference between "nextalign" and "nextclade" files as well as differences in operating systems and computer architectures.

🌐 Nextclade Web is available at https://clades.nextstrain.org

🐋 Docker images are available at DockerHub

📚 To understand how it all works, make sure to read the Documentation

Files

nextstrain/nextclade-2.14.0.zip

Files (9.8 MB)

Name Size Download all
md5:65b83b57622384cc5759e9662e3bd72d
9.8 MB Preview Download

Additional details

Related works