nextstrain/nextclade: 2.4.0
Authors/Creators
- 1. Biozentrum, University of Basel
- 2. @neherlab @nextstrain
- 3. @FredHutch / @blab / @nextstrain
- 4. @nextstrain
- 5. @Snyk
- 6. University of North Carolina
- 7. University of Freiburg
- 8. ETH Zurich @cevo-public
- 9. Oklahoma Medical Research Foundation
- 10. Faculty of Computer Science, Dalhousie University
- 11. The Francis Crick Institute
Description
Previously, Nextclade used sequence names to identify sequences. However, sequence names proven to be unreliable - they are often duplicated. This caused various problems where results with the same names could have been overwritten.
Since this version, Nextclade Web is using sequence indices (order of sequences in the input file or files), to tell the sequences apart, uniquely. This should ensure correct handling of duplicate names. This change only affects results table in the Web application. CLI is not affected.
Feature (Web): warn about duplicate sequence namesNextclade Web now reports duplicate sequence names. Duplicate sequence names often confuse bioinformatics tools, databases and bioinformaticians themselves, so we are trying to encourage the community to be more thoughtful about naming of their samples.
When duplicate names are detected during analysis in Nextclade Web, the "Sequence name" column of the results table now displays a yellow "duplicates" warning icon, and its tooltip contains a list of indices of sequences (serial numbers of the sequences in the input fasta file or files) having the same name.
Note that Nextclade compares only names, not sequence data themselves.
Feature (CLI): add "download dataset and run" shortcut"In this version we added --dataset-name (-d) argument to run command, which allows to download a dataset with default parameters and run with it immediately, all in one command.
For example this command.
nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta
or, the same, but shorter
nextclade run -O out -d sars-cov-2 sequences.fasta
will download the latest default SARS-CoV-2 dataset into memory and will run analysis with these dataset files. This is a convenience shortcut for the usual combination of nextclade dataset get + nextclade run. The dataset is not persisted on disk and downloaded on every run.
This release includes a routine upgrade of Auspice tree view. You can read the changelog in the Auspice GitHub repository
<details> <summary><h3>Commit history</h3> (click to expand)</summary> - [[`5da4fe5`](https://github.com/nextstrain/nextclade/commit/5da4fe540e55a8de3d4ed7cab2bdc2a7b8203423)] chore(deps): bump auspice in /packages_rs/nextclade-web Bumps [auspice](https://github.com/nextstrain/auspice) from 2.37.2 to 2.37.3. - [Release notes](https://github.com/nextstrain/auspice/releases) - [Changelog](https://github.com/nextstrain/auspice/blob/master/CHANGELOG.md) - [Commits](https://github.com/nextstrain/auspice/compare/v2.37.2...v2.37.3) --- updated-dependencies: - dependency-name: auspice dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> - [[`5a9c699`](https://github.com/nextstrain/nextclade/commit/5a9c699ee0dfe3ea68ba74290c25c24efae53de2)] feat: assert mutation is inside the gene - [[`22e4e2d`](https://github.com/nextstrain/nextclade/commit/22e4e2d9d591c8d5969f3c0921422f216fe04ab7)] feat: add "download dataset and run" shortcut" This adds `--dataset-name` (`-d`) to `run` command, which allows to download a dataset with default parameters and run with it immediately, in one command. For example this command. ```bash nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta ``` will download the latest default sars-cov-2 into memory and will run analysis with these dataset files. This is a convenience shortcut for the usual combination of `dataset get` + `run`. The dataset is not persisted on disk and downloaded on every run. - [[`e155b94`](https://github.com/nextstrain/nextclade/commit/e155b94f66b1f200116f5714d72e9d06cd0cdc6d)] fix: use indices to identify sequences uniquely Nextclade is using sequence names to uniquely identify sequences. However, sequence names come from user inputs and cannot be trusted to be unique. Neither it seems there is a consensus on uniqueness of sequence names in the bioinformatics community as a whole. This causes various problems where sequence names are used as identifiers, and when for example there are multiple sequences with the same name. In particular, when storing analysis results, they are effectively stores in an associative container, where sequence name acts as a key. This leads to newer results overwriting older results as they arrive during analysis. Additionally, some of the HTML `id` properties used sequence names to add uniqueness. This was leading to incorrect HTML being produces, with multiple elements having the same `id` property. In this PR I: - change internal storage to use sequence indices in the input file(s) as keys - add sequence index into HTML `id`s - display "Sequence index" in places where only sequence name was displayed previously This should ensure correct handling of duplicated names. This affects only web application. In the algorithmic and CLI parts, sequence names are not used - results are stored in the form of an array, and no HTML is involved. - [[`c43367e`](https://github.com/nextstrain/nextclade/commit/c43367ec1b5fd4eb3944a851a07bba0a1f8143dc)] Merge remote-tracking branch 'origin/master' into fix/web-unique-seq-ids - [[`e744ca6`](https://github.com/nextstrain/nextclade/commit/e744ca673856d6dba2db2068db62bb348d32a227)] chore: release web v2.3.1 - [[`e2f37d7`](https://github.com/nextstrain/nextclade/commit/e2f37d7a928172a335646ccbe971f7ca695b3a0d)] chore: fix CHANGELOG link to PR instead of issue - [[`2a88f30`](https://github.com/nextstrain/nextclade/commit/2a88f30f914989cf901bb12dd4890923ad722dfe)] Merge pull request #938 from nextstrain/fix/crash-gene-overflow - [[`c92683a`](https://github.com/nextstrain/nextclade/commit/c92683ab8e2698db17325eb61c847ce83905716e)] Merge pull request #939 from nextstrain/feat/dataset-get-and-run - [[`8a69ac7`](https://github.com/nextstrain/nextclade/commit/8a69ac764417465ca91c35e79d76276a980af7c0)] feat: remove sequence index from tooltips - [[`ec2104a`](https://github.com/nextstrain/nextclade/commit/ec2104a2bb6ffe62871fee98c678e139332d3147)] Merge pull request #946 from nextstrain/fix/web-unique-seq-ids - [[`92a7234`](https://github.com/nextstrain/nextclade/commit/92a723446ec7117a99176d9a02608974a796277d)] feat(web): report duplicate sequence names This adds little yellow icons in the "Sequence name" column when a sequence has the same name as other sequences in the same run. Indices of these sequences are additionally listed in the tooltip. - [[`a02b60c`](https://github.com/nextstrain/nextclade/commit/a02b60c89c7a40a90702293f911eac6e83c88a55)] docs: add changelog for 2.4.0 - [[`a2c37ad`](https://github.com/nextstrain/nextclade/commit/a2c37ad637a82ed36bdcd9498cfef114850fa70e)] Merge pull request #937 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.37.3 chore(deps): bump auspice from 2.37.2 to 2.37.3 in /packages_rs/nextclade-web - [[`5cb0f15`](https://github.com/nextstrain/nextclade/commit/5cb0f153cfd5077d01d955627d627eaf30771ca3)] Merge pull request #948 from nextstrain/feat/web-report-dup-seq-names feat(web): report duplicate sequence names - [[`d11f14f`](https://github.com/nextstrain/nextclade/commit/d11f14fd4985756bd01b0b8712cac538fcf20b18)] docs: extend changelog for 2.4.0 - [[`1f9b80d`](https://github.com/nextstrain/nextclade/commit/1f9b80d530f6151fee366ca97db06ebd7e06c145)] chore: release cli 2.4.0 and web v2.4.0 </details>Files
nextstrain/nextclade-2.4.0.zip
Files
(9.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ca14bb2b636cf76fc3653f663b168b9f
|
9.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/nextstrain/nextclade/tree/2.4.0 (URL)