Published January 22, 2018 | Version v1
Dataset Open

Defining objective clusters for rabies virus sequences using affinity propagation clustering

  • 1. Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Epidemiology, Greifswald-Insel Riems, Germany
  • 2. Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Molecular Virology and Cell Biology, OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Rabies Surveillance and Research, Greifswald-Insel Riems, Germany
  • 3. Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
  • 4. Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
  • 5. Institute of Mathematics and Computer Science, University Greifswald, Greifswald, Germany
  • 6. Wildlife Zoonoses and Vector-Borne Diseases Research Group, Animal and Plant Health Agency (APHA), OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Characterization of Lyssaviruses, Weybridge, United Kingdom

Description

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.

Files

00_Defining objective clusters for rabies virus_data_overview.csv

Files (832.9 kB)

Name Size Download all
md5:a7611d0ddb41b2aad9e521a959de6d43
1.2 kB Preview Download
md5:8391ec3d68333410cdbcab60951122d7
43.8 kB Preview Download
md5:8391ec3d68333410cdbcab60951122d7
43.8 kB Preview Download
md5:dde86dea42da3e25566985627839b708
146.4 kB Preview Download
md5:0960be590a2770c4b32fa8cf9c8b8635
597.7 kB Preview Download

Additional details

Related works

Is supplement to
Journal article: 10.1371/journal.pntd.0006182 (DOI)