Published August 16, 2023 | Version v1
Dataset Open

Supporting data and code for: Phylogenetic identification of influenza virus candidates for seasonal vaccines

  • 1. Simon Fraser University

Description

The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75–0.89 (Area under the curve AUC 0.83–0.91) over 2016–2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent three years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection.

Notes

All data files are in CSV format. All code was written in R (open-source), and influenza trees are included in RDATA files to be read into R. Accession numbers and references to the GISAID submitting laboratories for the sequences used in this study are included in a zip folder. To recreate the analysis in full, these accession numbers may be used to download the influenza sequences directly from GISAID. 

Funding provided by: Canada Research Chairs
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001804
Award Number:

Files

flu_vaccine-main.zip

Files (112.9 MB)

Name Size Download all
md5:e82e462a772ecb46f75610849c9cc53a
112.9 MB Preview Download
md5:78e78a7065333b8d045aab57973b3baa
9.9 kB Preview Download