A computational approach for the estimation of heart failure patients status using saliva biomarkers

The aim of this work is to present a computational approach for the estimation of the severity of heart failure (HF) in terms of New York Heart Association (NYHA) class and the characterization of the status of the HF patients, during hospitalization, as acute, progressive or stable. The proposed method employs feature selection and classification techniques. However, it is differentiated from the methods reported in the literature since it exploits information that biomarkers fetch. The method is evaluated on a dataset of 29 patients, through a 10-fold-cross-validation approach. The accuracy is 94 and 77% for the estimation of HF severity and the status of HF patients during hospitalization, respectively.

salivary diagnostics is an emerging field that is integrated as part of disease diagnosis and monitoring, allowing the ontime and accurate clinical decisions for improved patient care.
Identifying biomarkers with high sensitivity and specificity for HF severity estimation, progress and mortality, such as Uric Acid, Tumor Necrosis Factor Alpha (TNF-a), a-Amylase, Lactate, Cortisol and 8-iso-prostaglandin F2a, play a key role in the patient diagnosis and prognosis.
Uric acid is linked with the action of xanthine oxidase that is appreciated as an important contributor to both symptoms of HF as well as its progression [1].HF is characterized by the activation of neurohormones and cytokines.The role of inflammation in HF is of significant importance since high level of cytokines results to the clinical deterioration of HF patients [2].In patients with congestive HF, increased concentrations of cytokines have been found during acute phase [3].Since cytokines, such as TNF-a, is known to be involved in the remodeling of the heart and in the activation of neurohormonal pathways, their levels provide useful information of the status of the disease [4].In fact, it is well established that high TNF-a concentrations appear in patients with HF and these levels are highly correlated with the patient's functional class [5].
a-Amylase is a new biomarker for assessing the activity of the sympathetic nervous system and has been recently proved to be a prominent candidate for HF [6,7].Cortisol is known to affect cardiovascular risk factors, such as hypertension that in turn influence survival.Cortisol acts as mineralocorticoid receptor in the kidney and the heart and is considered as a potential biomarker for monitoring HF [8].The role of oxidative stress in congestive HF is well known [9].8-iso-prostaglandin F2a is an accurate marker of lipid peroxidation and consequently of oxidative stress [10].Even though lactate directly reflects cellular hypoxia [11], hyperlactatemia can appear during acute HF irrespective of tissue hypoxia [12].The hormonal and neurohormonal changes in congestive HF could be attributed to the metabolic changes.Additionally, abnormalities in metabolic pathway are linked to HF severity and hormonal changes [13].Although, the above mentioned studies demonstrate the strong correlation of saliva biomarkers with HF severity, progression and mortality, the studies reported in the literature [14,15], addressing those issues through the use of machine learning techniques, do not exploit the information included in saliva biomarkers.More specifically, they utilize as predictor features anamnestic data (age, gender), instrumental data (weight, systolic blood pressure, diastolic blood pressure, ejection fraction, oxygen saturation, heart rate, ECG parameters-atrial fibrillation, left bundle branch block, ventricular tachycardia), heart rate variability measures, as well as blood biomarkers (Brain Natriuretic Peptides-BNP or amino-terminal pro-peptide equivalent -NT-proBNP, C-reactive protein-CRP, the interleukin family member ST2, hemoglobin and blood urea nitrogen -BUN) [16] and signs and symptoms of HF .
The aim of this work is twofold.First, to estimate the HF severity and patient status during hospitalization by utilizing the above mentioned data in combination with data extracted by saliva biosensors and second to address those two issues by exploiting only biomarker values.The estimation of HF severity in terms of NYHA class is addressed as a four class classification problem, while for the estimation of patient status during hospitalization two "one-versus-rest" classification models are developed; a model discriminating acute phase versus rest (progressive and stable) and a model discriminating stable phase versus progressive.

A. Dataset
The proposed method is evaluated using a dataset of 29 patients collected by the clinical center of the Universita Di Pisa (UNIPI), Italy within the HEARTEN project [17].The dataset consists of patients: (i) diagnosed with HF (Framingham criteria) who have continuous symptoms with frequent recurrence, (ii) belonging to the functional NYHA I-IV class followed by an optimal treatment, (iii) who have been recently hospitalized, (at least one in the last six months), (iv) who have undergone one electrocardiogram (in the last 12 months ) and have HF symptoms.Patients who are underage, with very severe HF, with obesity and advanced chronic kidney failure are not included.
The features recorded for each patient can be grouped to the following eight categories: (i) General Information, (ii) Allergies, (iii) Medical Condition, (iv) Drugs, (v) Biological data related with the HF disease, (vi) Clinical Examinations, (vii) Adherence, (viii) Biomarkers.Uric Acid, TNF-a, a-Amylase, Lactate, Cortisol and 8-iso-prostaglandin F2a, are measured.Totally, 65 features are recorded for each patient.I).The patient's health status during hospitalization is characterized as acute (day of admission), stable (day of discharge) and progressive (days between admission and discharge).According to this characterization 29, 25 and 59 instances belong to acute, stable and progressive class, respectively.Since the aim of this work is to estimate the severity of the status of the patient during hospitalization and the severity of HF in terms of NYHA class, healthy subjects are not included.

B. The proposed method
The proposed method consists of three steps: (i) preprocessing, (ii) feature selection and (iii) classification.In the first step, missing values are addressed.In the second step, the identification of features discriminating NYHA classes and HF patient status during hospitalization is performed.The second step is performed only when the input of the proposed method include all the groups of features mentioned in Section II-A.Finally, in the third step, classification in terms of NYHA class and HF health status during hospitalization takes place.A detailed description of the three steps is provided below, while a flowchart of the proposed method is shown in Fig. 1.Step 1-Pre-processing: Features with more than 60% of missing values are removed, since imputation of missing values cannot be performed due to the nature of the data.
Step 2-Feature selection: It is performed using the wrapper approach in combination with the classifiers employed in step 3.

III. RESULTS
The proposed method is applied four times: case 1) estimation of HF severity using features from eight groups described in Section II-A, case 2) estimation of HF severity using only biomarkers (group viii), case 3) estimation of patient status during hospitalization using all the groups of features, case 4) estimation of patient status during hospitalization using only biomarkers.In the case 1 and the case 3, feature selection is applied.
The models that provide the best results for the estimation of HF severity and patient status during hospitalization are presented in Table II and Table III, respectively.The results are expressed in terms of accuracy (ACC), positive predictive value (PPV), sensitivity (SENS), specificity (SPEC), area under curve (AUC) and F-measure (FM).For the evaluation of the classifiers, 10-fold stratified crossvalidation is applied.SMOTE is applied to the training set during the 10-fold cross-validation procedure to address the imbalanced class problem (case 3 and case 4).
Additionally, the feature corresponding to NT-proBNP blood biomarker is added in the datasets, consisting only of saliva biomarkers, and the models are built again.The results for this case are presented in Table IV.
For the estimation of the patient status during hospitalization, further "one versus rest" models are built in order to conclude to the best one.The results are presented in Table V.The best results are obtained using LMT classifier, when biomarkers are exploited in combination with other group of features, and RF classifier using only features corresponding to biomarkers.More specifically, the obtained accuracy is 58 and 56%, respectively.
According to the experts the progression phase can be divided in two phases, the phase that the patient starts overcoming the HF event and the phase that the patient is close to become stable.The application of the proposed method in order the two progression phases to be discriminated provides 68% accuracy, when the full set of features are employed (group of features i-viii) and 61% accuracy, when only saliva biomarkers are utilized.

IV. DISCUSSION
An automated method for the estimation of HF severity and in hospital status of HF patients, utilizing information from saliva biomarkers, is presented.HF severity estimation, in terms of NYHA class, is addressed as a four class classification problem, while the patient status (acute, progressive, stable) is addressed as a two class classification problem.More specifically two models are built.The first model discriminates acute instances from non-acute (progressive and stable) and the second model discriminates stable instances from progressive.Two runs are performed for each classification problem (HF severity estimation and patient status estimation).One run, where all the group of features, described in Section II-A, are employed (case 1 and case 3) and one run, where only biomarkers are utilized (case 2 and case 4).In all the cases, nine classifiers are tested.Feature selection is applied only in cases 1 and 3, while SMOTE resampling technique is employed in the classification step, if necessary (case 3 and 4).
Although the estimation of HF severity has already been addressed, it is the first time that saliva biomarkers are employed.The obtained accuracy is 94% (case 1) and 74% (case 2), while in case that saliva biomarkers are combined with the blood biomarker NT-proBNP the accuracy is slightly increased.This observation will be further examined in the future by applying the proposed method to a larger dataset, as well as by incorporating further blood biomarkers that according to the literature are indicative for HF severity.
The in hospital estimation of patient status is also addressed using: (i) saliva biomarkers in combination with anamnestic and instrumental data, (ii) only saliva biomarkers and (iii) saliva biomarkers in combination with NT-proBNP.The classification of instances in acute and non-acute is achieved with 85% accuracy when all features are employed, 74% accuracy when only saliva biomarkers are used, while the accuracy is increased (76%) when saliva and blood biomarkers are given as input to the proposed method.The non-acute instances are further classified to progressive and stable with 70% accuracy.The results are modified when stable versus non-stable and acute versus progressive models are built (Table V).The patients were followed, from three to seven days, while the measurements were taken every second day decreasing thus the number of progressive instances and making difficult their discrimination from stable and acute instances.This can justify low SPEC values.In the future, the estimation of patient status during hospitalization will be addressed as a three class classification problem (acute vs. progressive vs. stable) where more data from the three different classes will be available.Furthermore, the evaluation of the proposed method using leave-one-out approach will be performed.
According to the current study the saliva biomarkers that mainly contribute to the estimation of the patient status during hospitalization are TNF-a, a-Amylase, cortisol and lactate.The first two (TNF-a, a-Amylase) biomarkers contribute mainly to the discrimination of acute versus progressive instances, while the last two (cortisol and lactate) contribute mainly to the discrimination of progressive versus stable instances.The results will be further validated in the future and new observations will be extracted when more data of saliva biomarkers, as well as breath biomarkers will be collected within the HEARTEN project.

V. CONCLUSIONS
Severity estimation of NYHA class, as well as the patient status is addressed through the utilization of machine learning techniques and data expressing signs and symptoms of HF, blood and, for the first time, saliva biomarkers.Both issues addressed in this study contribute significantly to the early prediction of adverse events and better management of HF patients reducing thus the HF severe consequences in terms of cost and quality of life.
These features are recorded from the first time of patient's hospitalization until discharge every second day.Thus, a set of 113 instances are collected.The dataset includes: 14 instances in NYHA class I, 26 instances in NYHA class II, 31 and 42 patients in NYHA class III and IV, respectively (Table

Figure 1 :
Figure 1: Flowchart of the proposed method.

TABLE I :
DATASET DESCRIPTION

TABLE II :
RESULTS OF THE PROPOSED METHOD FOR THE ESTIMATION OF HF Model: Random Forests without feature selection

TABLE III :
RESULTS OF THE PROPOSED METHOD FOR THE ESTIMATION OF PATIENT STATUS DURING HOSPITALIZATION

TABLE IV :
RESULTS OF THE PROPOSED METHOD USING BLOOD AND SALIVA

TABLE V :
ACCURACY OF THE PROPOSED METHOD ESTIMATION OF PATIENT STATUS USING DIFFERENT "ONE VERSUS REST" MODELS