Published March 25, 2022 | Version v1
Other Open

13. Clustering of anti-neutrophil cytoplasmic antibody-associated vasculitis - using a pre-processed harmonised dataset

  • 1. 1Department of Clinical Sciences - Rheumatology, Lund University, Lund, Sweden,
  • 2. 2School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland,
  • 3. 3Department of Clinical Sciences - Nephrology, Lund University, Lund, Sweden
  • 4. , 4Trinity Translational Medicine Institute, Trinity College Dublin, Dublin, Ireland,
  • 5. 5Department of Medicine, University of Cambridge, Cambridge, UK

Description

Background: The sub-classification of anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV) has been a long-standing debate. Unsupervised learning has previously been used for partitioning of phenotypic groups, but as AAV is a rare disease, small sample sizes have been a limiting factor. Here we attempt clustering of a small dataset harmonised to the FAIRVASC ontology, allowing potential future inclusion of an additional 6000 AAV patients from the FAIRVASC collaboration registries to the cluster model. FAIRVASC is a research project seeking to federate AAV registries across Europe using semantic web technologies (https://fairvasc.eu).

 

Methods: This study used a dataset of 292 patients from southern Sweden, classified as granulomatosis with polyangiitis (GPA) or microscopic polyangiitis (MPA), according to the European Medicines Agency algorithm. The dataset was pre-processed from a relational database format to a resource descriptive framework (RDF) graph-based data model, harmonising the dataset to a FAIRVASC standard. Factor analysis of mixed data (FAMD) and agglomerative hierarchical cluster analysis on principal components (HCPC) was used to develop a cluster model, including organ pattern, ANCA status, serum creatinine, C-reactive protein, gender, and age at diagnosis. The generated clusters were evaluated by baseline characteristics, mortality, and renal outcome. 

 

Results: The analyses involved data for 163 subjects with GPA and 129 with MPA. The clustering model resulted in two larger clusters and three smaller ones. The larger clusters were a predominantly anti-PR3 positive cluster of young (mean 57.5 years at diagnosis) patients with ear-nose-throat involvement and a favourable outcome (Cluster 1), and a predominantly anti-MPO positive cluster with severe kidney involvement and high rates of mortality and end-stage kidney disease (Cluster 5). The three smaller clusters differed in terms of organ involvement and ANCA status at diagnosis, one with severe lung and renal involvement and a poor outcome (Cluster 3) and two with similar outcome, one ANCA negative (Cluster 4), and one with peripheral nerve involvement (Cluster 2). The descriptive characteristics of the clusters are presented in table 1. 

 

Conclusions: Our analysis suggests five clusters of AAV patients based on baseline features, associated with different mortality and renal outcome. The investigation acts as a proof of concept of the FAIRVASC ontology and infrastructure for the harmonisation of heterogeneous AAV datasets. The cluster model may in the future readily include an unprecedented number of European AAV patients. 

 

Disclosures: None

 

 

 

Files

Files (343.2 kB)

Name Size Download all
md5:4f00676ec188cb65fb8606dcd797c2a3
343.2 kB Download