Outlier Detection for Foot Complaint Diagnosis: Modeling Confounding Factors Using Metric Learning

Diagnosing foot complaints using plantar pressure videos is complicated by the presence of confounding factors (e.g., age, weight). Outlier detection could help with diagnosis, but these confounding factors result in data that are not independent and identically distributed (IID) with respect to a specific patient. To address this non-IID problem, we propose the modeling of confounding factors using metric learning. A distance metric is learned on the confounding factors in order to model their impact on the plantar pressures. This metric is then employed to weight plantar pressures from healthy controls when generating a patient-specific statistical baseline. Statistical parametric mapping is then used to compare the patient to this statistical baseline. We show that using metric learning reduces variance in these statistical baselines, which then improves the sensitivity of the outlier detection. These improvements in outlier detection get us one step closer to accurate computer-aided diagnosis of foot complaints.

I n diagnostic medicine, it is well-established that a health condition can often produce different symptoms in different people. Therefore, it can be challenging or inappropriate to model a patient population as a single homogeneous group. Instead, an outlier detection approach is preferred. This is equally the case when assessing foot complaints. 2 When assessing foot complaints, a gait analysis is performed with pressure-sensing plates, 11 wearable sensors, 1 or cameras 13 recording the patient's walk. In this work, we consider videos of the plantar pressures measured from the bottom of a person's foot as they walk over a pressure-sensing plate. While previous works have described anomalies from these plantar pressures at the group level, it is also known that different patients have different abilities to cope with foot complaints, 11 suggesting that plantar pressure anomalies are likely patient-specific. Outlier detection is therefore desired to identify whether anomalies exist in an individual's plantar pressure video, and if so, when and where. By localizing these abnormalities, it is hoped that a better diagnosis of foot complaints can be achieved. 2 One popular approach to outlier detection in the medical domain is to statistically model healthy controls, then use this model as a baseline to which individual patients can be compared. 6 Patients that are significantly different from this baseline are identified as having an anomaly. For medical imaging applications like our plantar pressure video analysis, outlier detection is regularly combined with statistical parametric mapping (SPM) to localize anomalies in both space and time. 2,4 SPM-style outlier detection typically follows an established workflow. First, the plantar pressure videos were brought into spatiotemporal alignment. 3 At each pixel in each frame, the plantar pressures are then statistically modeled using Normal distributions, resulting in a statistical baseline defined by a mean plantar pressure video and a standard deviation video. Subsequently, a patient's plantar pressure video is spatiotemporally aligned to the baseline's mean video, and single-sample t-tests are then computed as the outlier scores at each pixel. Finally, random field theory is used to identify whether the 1541 patient's plantar pressures are statistically significant outliers of the baseline distributions. 10 This SPM-style outlier detection allows for the localization of anomalies to specific anatomical structures and specific time points in the footstep.
While this approach to outlier detection has shown promising results in other applications, 4 one property that limits its application to plantar pressure videos is that the computation of mean and standard deviation videos assumes each healthy control is independent and identically distributed (IID) from the same population as the patient under examination. Unfortunately, this is not the case for plantar pressures. It has been well established that demographic factors like weight, age, sex, height, and shoe size, impact plantar pressures in ways that are unrelated to known foot complaints. 7 As a result, we have a contextual outlier detection problem where the demographic features define the context under which plantar pressures normality should be judged. While multiple contextual outlier detection techniques exist, 2,8,15 they are either limited to linear regression models, 2 or have yet to show compatibility with the SPM framework. Specifically, SPM requires that the contextual outlier detection algorithm produce a statistical measure (e.g., tstatistic, F-measure) as its outlier score. That way, random field theory can still be used to establish the threshold at which a pixel becomes an outlier. In this article, we propose the integration of contextual outlier detection into SPM for the purpose of detecting outliers in plantar pressure videos. We base our approach on the idea that the plantar pressures from our non-IID healthy controls should be weighted based on the demographic similarities that those control subjects have with the patient under examination. As in Zheng et al., 15 these similarities are modeled using metric learning in order to manage the relative importance of each demographic factor on the statistical baseline. We hypothesize that this metric learning approach will produce more numerically accurate statistical baselines to which patients can be compared, resulting in more reliable anomaly detection.

METHODS
Consider a set of plantar pressure videos fV 1 ; . . . ; V N g sampled non-IID from N healthy controls. For each healthy control, we also assume that we have measured the confounding factors of age, sex, weight, height, and shoe size. Let fy 1 ; . . . ; y N g be column vectors containing these demographics. Similarly, let V test and y test be the plantar pressure video and demographics for the patient to be evaluated. We will assume that these plantar pressure videos have already been spatially and temporally aligned using STAPP. 3

Patient-Specific Baselines
In previous works, 4 healthy controls are assumed to be sampled IID from the same population of the patient. As a result, the baseline mean M and standard deviation S videos are constant for all patients and given by where x is a pixel location, and t a time frame, in the video. In practice, however, each healthy control is not an IID sample. Instead, we know that a person's age, weight, height, sex, and shoe size all influence their plantar pressures. Therefore, we propose to model this problem as a contextual outlier detection problem where these demographic factors provide the necessary context. To address these concerns, we propose to weight each healthy control differently in the creation of the statistical baseline. The demographic factors will be used to define these weights in order to address the non-IID sampled healthy controls. This approach takes inspiration from how an artist would mix different amounts of different paints in order to produce a new color (see Figure 1). It should be noted that Serag et al. 12 proposed a similar idea where a single demographic measurement (age) was used, but their use of a single demographic factor results in a trivial way of defining similarity between individuals.
In contrast to the IID outlier detection approach in Equations (1) and (2), we construct the baseline pixelby-pixel normal distributions using a global weighted kernel regression, resulting in mean and standard deviation videos defined as where the weights are obtained from a Gaussian kernel over the distances between the contextual demographic factors This kernel regression ensures that healthy controls with similar demographics to the patient receive higher weights for their plantar pressure videos than other less similar healthy controls. Unfortunately, this regression alone does not explain how distances should be measured between demographics factors in order to define an accurate statistical baseline for an individual. For example, a person's weight has a greater impact on plantar pressures than a person's age 7 and this should be captured in the distance metric, but to what degree? How do we place a number on this?

Metric Learning
To address this question, we developed a high-dimensional extension of the metric learning for kernel regression (MLKR) algorithm initially proposed by Weinberger and Tesauro. 14 MLKR has been used for contextual outlier detection before, 15 but here we combine it with our statistical baseline above in order to integrate it within the SPM framework. Conceptually, MLKR aims to define a distance metric that optimally reconstructs the plantar pressure videos. This objective is captured mathematically through a leave-one-out regression loss function incorporating wðÁÞ, the kernel over demographic distances defined in Equation (5). Effectively, this approach aims to learn a distance metric that emphasizes each demographic factor based on the degree to which each factor influences the plantar pressure videos. A Euclidean norm is used here as each plantar pressure dataset is assumed to be an equally reliable estimate of the gait of a healthy individual. While MLKR allows for a variety of distance metrics, we follow the approach in Weinberger and Tesauro 14 and model the distances using a Mahalanobis-style metric distðy i ; y j Þ ¼ ðy i À y j Þ T Pðy i À y j Þ; where the positive semidefinite matrix P captures the relative influences of each demographic factor to both the plantar pressure videos and to each other. To Metric learning is used to estimate how much each demographic factor should impact the creation of the statistical baseline.
estimate P, we first replace it by its Cholesky decomposition P ¼ LL T in order to preserve its positive semidefinite structure. Subsequently, the decomposed matrix L is then solved for by inserting it into Equation (7), inserting Equation (7) into Equation (5), inserting Equation (5) into Equation (6), and minimizing the resulting regression loss function L with respect to L. We note that this loss function is convex and can be minimized using, among other techniques, gradient descent. The gradient of L with respect to L is Using the plantar pressures and demographics from our healthy controls, we employ gradient descent to solve for the decomposition matrix L. This matrix can then be used to generate patient-specific baseline plantar pressures using (3) and (4). Note that this optimization has no hyperparameters that need to be fine tuned. 14

Outlier Detection
Once a distance metric is learned over the demographics, patient-specific statistical baselines are computed at each pixel of the plantar pressure video and the SPM framework can then be used to identify outliers. To do so, the plantar pressure video of the patient under evaluation is checked for anomalies using z-scores 4 Z test ðx; tÞ ¼ V test ðx; tÞ À M test ðx; tÞ S test ðx; tÞ : Pixel-by-pixel anomalies are then defined as those whose z-scores are statistically significant, at a ¼ 0:05, following multiple comparison correction using random field theory. 10

EXPERIMENTAL SETUP
To evaluate the proposed outlier detection technique, we employ three datasets of plantar pressure videos. First, an internal dataset of 430 healthy controls is used as the training set to learn the distance metrics over the demographic factors. This dataset is also used to generate the baseline mean and standard distribution videos. Second, the CAD WALK healthy controls database (http://doi.org/10.5281/zenodo.1265420) contains plantar pressure videos from 55 healthy individuals. These measurements are used to validate how well the estimated baselines match real plantar pressure videos from healthy individuals. Finally, the CAD WALK hallux valgus database (http://doi.org/ 10.5281/zenodo.2598496) is used to evaluate how capable the proposed technique is at identifying the plantar pressure anomalies that are known to exist for this patient population. 2 This dataset consists of 69 hallux valgus cases measured from the feet of 50 patients (19 of these 50 patients have both feet affected). All plantar pressure videos were collected using calibrated rs scan footscan pressure-sensing plates (rs scan, Paal, Belgium). The CAD WALK healthy controls were measured at 500 Hz, whereas the other two groups were measured at 200 Hz. Also, participants in both CAD WALK datasets were measured using a three-step protocol (i.e., the third step of the walk is measured), whereas the internal dataset was collected using an eight-step protocol. 5 The study was approved by the internal review committee of the Sint Maartenskliniek and met the requirements for exemption from the Medical Ethics Committee review under the Dutch Medical Research Involving Human Subjects. The study was performed in accordance to the declaration of Helsinki.
To test our proposed outlier detection technique (subsequently labeled FULL for full metric learning), we compared our technique to five similar competing approaches. The first competing technique was a onedimensional version of the metric (subsequently labeled 1D) where P ¼ sI (s is the unknown scaling parameter and I is the identity matrix). The one-dimensional version is conceptually equivalent to the approach of Serag et al. 12 The second technique was the IID approach described in Equations (1) and (2) and used in 4 (subsequently labeled IID). The third technique (subsequently labeled DIAG) uses a diagonal L matrix for the metric. This approach serves as a middle ground between our proposed FULL metric approach and the 1D approach of Serag et al. 12 The fourth competing approach is PAPPI, 2 which uses linear regression to model the impact the demographics have on the plantar pressures. Finally, we also compare robust contextual outlier detection (ROCOD), 8 which combines the linear regression baseline of PAPPI with an IID baseline calculated over a local neighborhood defined by the demographics. For ROCOD, the local demographic neighborhoods were empirically defined to contain samples whose max-min normalized demographics are within a Euclidean distance of t ¼ 0:25 of the test patient.
To quantify the performance of each outlier detection technique, we perform three experiments. First, we compare the estimated baseline mean plantar pressure videos to real measured plantar pressures from healthy controls. Differences between the two videos are quantified using the average per pixel absolute error. We hypothesize that our FULL metric learning approach will achieve the lowest absolute errors. Second, we measure baseline sensitivity using the average per pixel magnitude of the baseline standard deviation videos. We hypothesize that our FULL metric learning approach will show the lowest variability compared to the competing algorithms. Third, we perform outlier detection on both the hallux valgus patients and healthy controls. We hypothesize that our FULL metric learning approach will be more sensitive to outliers than competing algorithms. Additionally, we hypothesize that the hallux valgus patients examined with our FULL metric learning approach will have plantar pressures outliers that show better agreement with previous hallux valgus studies than competing approaches. Specifically, we expect to see outliers around the hallux (i.e., big toe) where the foot condition is present. 2 Figure 2 shows an example of our FULL outlier detection technique on a plantar pressure video from one of our 69 hallux valgus cases. Note that the estimated baseline plantar pressure video shows a realistic pressure pattern and that the detected outliers under the midfoot, toes, and metatarsal 1 agree with what is commonly seen in hallux valgus patients. 2 Similar figures for all plantar pressure datasets and all algorithms are provided as supplementary material. Figure 3(a) shows the average per pixel absolute error between the measured plantar pressures from healthy controls and the baseline mean images from each statistical baseline. Both the DIAG and FULL metric learning approaches produced baseline mean images that were closest to the real measured plantar pressures, each having an average error of 26.56 kPa. These errors were 1.53%À3.15% lower than competing algorithms. Paired t-test showed that the FULL metric learning approach produced errors that were statistically lower than the IID and 1D approaches (p ¼ 0:004 versus IID, p ¼ 0:009 versus 1D). Figure 3(b) shows the average per pixel standard deviations for each statistical baseline estimation technique. The inclusion of the demographic factors significantly reduced the variability in the baseline, with our FULL metric learning approach showing the lowest variability on average. These decreases in variability were between 0.67% and 8.66% in magnitude. Paired t-tests show that FULL metric learning produced significantly lower baseline variability than IDD (p < 1e À10 ), 1D (p < 1e À10 ), DIAG (p < 4e À4 ), and PAPPI (p ¼ 0:030).

RESULTS
For each dataset and outlier detection technique, the average percentage of pixels in the plantar pressure videos that were identified as outliers is shown in Figure 3(c). As expected, the hallux valgus patients showed more pressure outliers than the healthy controls. Also, the number of outliers increased as the statistical baselines became more accurate and contained less variability, suggesting improved outlier sensitivity. In this respect, both FULL and ROCOD show the greatest number of outliers. Figure 4 shows spatiotemporal histograms (i.e., pixelby-pixel counts) of the detected outliers for each of the five outlier detection approaches on the left feet of hallux valgus patients. Note that for Hallux Valgus patients, abnormal plantar pressures are expected at the location of the foot condition: under the hallux (i.e., big toe) and metatarsal 1 (i.e., at the base of the big toe). The FULL and DIAG metric learning approaches equally show the highest number of outliers in these areas as shown by the red arrows in Figure 4(a) and (c). Additionally, there is evidence that Hallux Valgus patients are more likely to have flat feet. 2 The FULL and DIAG metric learning approaches also equally identify outliers in the midfoot in Figure 4(b), showing that it is identifying these flat feet. It is also worth noting that the non-IID outlier detection Finally, Figure 5 shows the precision matrices P learned for the 1D, DIAG, and FULL metric learning approaches. We noticed that a person's weight has the largest impact on their plantar pressures, and that notable interactions are present between all demographic factors. These results further emphasize the need for non-IID outlier detection.

DISCUSSION
Overall, the estimations of statistical baselines improved as the impact of the demographic factors were increasingly and more flexibly modeled. Our FULL metric learning approach also integrates well into the SPM framework in order to perform outlier detection across whole plantar pressure videos. As a result of these improved statistical baselines, our approach was able to identify more outlier plantar FIGURE 4. Histograms of detected outliers for the hallux valgus patient dataset (left foot). Note that the FULL and DIAG metric learning approaches show more outliers in the midfoot (at 50% into stance phase), hallux (at 25% into stance phase), and metatarsal 1 (at 75% into stance phase), results that agree best with previous hallux valgus studies. (a) 25% into stance phase.
pressures and, more importantly, have those outliers match better with previously reported results on hallux valgus patients. These results suggest that our proposed outlier detection technique has improved the sensitivity and reliability of outlier detection in plantar pressure videos.
Nevertheless, there are results in our study that suggest caution regarding the interpretation of the detected outliers. First, the number of outliers detected for the healthy controls was higher than we would expect. When we qualitatively evaluated the outliers for this dataset (see supplementary materials), we observed that these outliers are generally caused by two effects: errors in spatiotemporal alignment between the baseline and measured plantar pressure videos, and harder heel strikes in the CAD WALK healthy control dataset than in our internal healthy controls dataset. The former of these effects is a limitation of all SPM-style outlier detection algorithms and is something that should be manually checked each time an algorithm like this is used. 2,3 The latter of these effects may be related to a difference in data collection between the two datasets. 5 The CAD WALK dataset was collected using a threestep protocol, whereas our internal dataset used an eight-step protocol. As a result, the CAD WALK participants may have been in a less natural walking rhythm at the time of measurement than the participants in our model. Whether this is indeed the case is something that we intend to investigate. Additionally, Figure 3(b) shows a number of large outliers in our proposed estimation of baseline standard deviations. We hypothesize that the increased baseline variability for these individuals may be related to their demographic similarity to the training database. Note that the proposed algorithm creates statistical baselines through the interpolation of plantar pressure videos in the database, and that this interpolation is based on demographic factors. If a patient with notably different demographic measurements presents themselves, we do not have similar individuals in the training database from whom we can interpolate a good statistical baseline. In such cases, extrapolation is required. Our future work will look into whether recent work in machine learning can provide this extrapolation effect.
Finally, it is worth noting that the proposed technique-incorporating contextual outlier detection with SPM-has applications beyond the analysis of plantar pressure videos. SPM has seen extensive use in the field of biomechanics 10 and the proposed technique could be used to identify form breaks in athletes. Medical imaging also makes extensive use of SPM, 4 and the proposed technique for computeraided diagnosis. The geosciences also employ SPM for the study of hyperspectral images. 9 The proposed technique can also be used there to detect, for example, groundwater contamination. These areas remain as future work to be explored.

CONCLUSION
We have proposed herein an algorithm for the statistical outlier detection, incorporated it into the SPM framework, and applied the result to the outlier detection in plantar pressure videos. Due to the contextual effects of multiple demographic factors, we employed metric learning to generate patient-specific plantar pressures benchmarks for each pixel of each frame of the plantar pressure videos. We observed that, for healthy individuals, our proposed patient-specific benchmarks were more comparable to measured plantar pressures than an IID approach or linear models can provide. This leads to an outlier detection technique that is more sensitive to pressure outliers and appears to detect pressure outliers that show better agreement with clinical literature. While more study is needed, the introduction of this non-IID outlier detection approach may ultimately improve the ability to diagnose foot complaints from plantar pressure videos.