Breast Mass Detection And Classification Algorithm Based On Temporal Subtraction Of Sequential Mammograms

Breast cancer screening with mammography is the most efficient way to reduce breast cancer mortality. However, the large population and the use of double reading creates a high workload that heavily burdens the efficiency of the radiologists. Hence, Computer-Aided Detection (CAD) systems are being developed to assist the radiologists. In this study, an algorithm for the automatic detection and classification of breast masses, based on temporal subtraction of sequential mammograms, image registration and machine learning, is presented. While, some previous studies in the literature proposed temporal analysis by creating a new feature vector, temporal subtraction takes into consideration the entire prior image. A new dataset, consisting of 40 cases (two time points and two views of each breast per patient, a total of 160 images), with precisely annotated mass locations was created. The accuracy of the classification of masses as benign or suspicious increased from 85.7% (using temporal analysis) to 92.9% (using temporal subtraction). The improvement was statistically significant with p < 0.05. These results demonstrate the effectiveness of temporal subtraction for the diagnosis of breast masses.


INTRODUCTION
Breast cancer remains a significant medical concern in women's health despite recent improvements in the survival rates. Radiologists worldwide use mammography as a re liable tool for breast cancer screening [1]. In the case of positive findings, the most appropriate disease management route is followed. According to the Breast Imaging Reporting and Data System (BI-RADS), numerous types of abnormal ities can be found in a mammogram, including masses of various sizes and shapes [2]. Masses are characterized by their size, shape, intensity and texture, and radiologists ex amine those properties in order to classify them as benign or suspicious [3]. This classification is challenging, not only because of the large variation in size and shape of masses, but also because of their low image contrast [4]. Thus, Computer Aided Diagnosis (CAD) Systems are being developed to help radiologists with the detection and classification of breast masses [3].
While most studies in the literature propose different al gorithms for the diagnosis of breast masses using the most re cent mammogram [1,3,5], few methods compare recent and prior mammograms. Radiologists routinely exploit such com parisons to identify newly developing abnormalities. Some studies have assessed the use of temporal sequential images for the creation of a new technique, called temporal analysis [6,7,8]. In temporal analysis the detected Regions of In terest (ROIs) from the recent image are associated with the corresponding ROIs in the prior one, using image registration algorithms. Then, a new temporal feature vector is created and used for the classification. Despite the fact that the find ings are promising, only parts of the prior image are used in the analysis.
In this study, an automatic algorithm for the detection and classification of breast masses using subtraction of temporally sequential mammograms, is presented. Temporal subtrac tion was previously developed by this group and applied suc cessfully to the diagnosis of micro-calcifications [9]. Com pared to temporal analysis, temporal subtraction uses the en tire prior image which, after registration, is subtracted from the recent one. The unchanged regions are effectively re moved and the Contrast Ratio (CR) is successfully increased. Subsequently, False Positive (FP) detections are eliminated and the true masses are classified as benign or suspicious. Temporal analysis was also performed in this study for com parison purposes.

Dataset
For the purposes of this study, a new dataset was created since in the open access databases, only the recent images are avail able for each patient and those images are not precisely an-  notated to show the exact margins of a mass, limiting their value as ground truth. The dataset consists of 40 pairs of fullfield digital sequential mammograms from Craniocaudal (CC, above) and Mediolateral oblique (MLO, side-angle) views, collected from 40 women undergoing routine screening mam mography examinations (a total of 160 images). Consecutive mammograms were collected from 2012 to 2019 and the age of the participants varied from 40 to 75, with a mean age of 57.48 ± 8.96 years and median age of 57.50. Half the cases belonged to patients with at least one suspicious mass in the most recent mammogram. The remaining cases were obtained from healthy individuals without any signs of malig nancy in either screening. These cases were selected to form a matched group compared to those with suspicious findings. This dataset also includes detailed annotations of the bound ary of each mass (both benign and suspicious) to be used as the ground truth (Fig. 2). The selection and annotation was performed by two expert radiologists (ten and two years of experience).

Mass Detection and Classification using Temporal Subtraction
In temporal subtraction the prior and recent images were pro cessed in parallel. Figure 1 shows the proposed methodology. First, normalization was applied, to adjust the range of pixel intensity values and then Contrast Limited Adaptive His togram Equalization (CLAHE), Gamma correction and bor der removal took place. CLAHE is based on the re-allocation of the gray levels inside an image using the probability dis tribution of the input gray levels. With this technique, the maximum contrast enhancement factor is affected by the clip level that is defined by the user [10]. Contrast adjustment us ing Gamma correction was used to account for the non-linear mapping of intensities [11]. Border removal eliminated the high intensity areas connected to the border (including the pectoral muscle) [12].
For an effective subtraction of the sequential images, reg istration is required. Registration compensates for changes that occur in the breast over time and differences in the mam mography procedure. Several registration techniques have been proposed in the literature [13]. In this study, Demons registration was used. Demons is a local technique that aligns the moving (prior) image to the fixed (recent) one, using re gional similarity and location. In Demons the registration is viewed as a diffusion process and it can be represented as the energy function with respect to the update field, using fixed image, moving image and a transformation field [14].
After the registration, the prior registered image was sub tracted from the recent one, to eliminate the regions that re mained unchanged between the screenings, and the CR of the subtracted image was compared to that of the most recent im age. Afterwards, the subtracted image was processed using unsharp-mask filtering, a spatial filter that subtracts a blurred version of the original image for the creation of a new en hanced image [15]. For the segmentation of breast masses, the difference image was converted to binary using Otsu' s thresholding and then morphological operations (erosion with a radius of 2 pixels and closing with a radius of 10 pixels) were applied. The remaining ROIs were considered as possi ble masses.
Machine learning was performed to eliminate the large number of FP detections. In total, 28 features were extracted, divided in four major categories: shape, intensity, First-Order Statistics (FOS) and Gray Level Co-occurrence Matrix fea tures (GLCM). Those features were selected for their capa bility to identify breast masses based on their unique charac teristics. They include: area, circularity, compactness, con vex area, eccentricity, equivalent diameter, Euler number, ex- tent, filled area, major and minor axis length, orientation, perimeter, solidity, shape ratio, average, min and max inten sity, entropy, kurtosis, skewness, smoothness, standard devi ation, variance, contract, correlation, energy and homogene ity. Detailed description of some features can be found in a previous publication of this group [9]. Feature selection was then applied, using feature importance, for the elimination of the least important features, in order to increase the classi fication performance. Feature importance is based on deci sion trees and returns a score for each available feature in the dataset. This value is calculated as the decrease in node im purity weighted by the probability of reaching that node. The higher the score, the most important or relevant is the feature. Eight classifiers were assessed: Linear Discriminant Analysis (LDA), 5-Nearest Neighbors (5-NN), Support Vec tor Machines (SVM), Decision Trees (DT), Extra Trees (ET), Bagging, Voting and Artificial Neural Network (ANN), with leave-one-patient-out cross-validation. Each time the four images of a single patient (CC and MLO views of the most recent and prior mammograms) were used as the test set, while the remaining patients were used for the training of the model, until all the cases were classified. With this approach, bias is avoided since all the ROIs of a single patient were used either in the training or in the testing, but not both sets. In the first round, the detected ROIs were classified as masses or normal tissue and in the second round, the true masses were further classified as benign or suspicious. The performance of classification was evaluated by measuring sensitivity, speci ficity, accuracy and the Area Under the Receiver Operating Characteristics (ROC) Curve (AUC).

Mass Detection and Classification using Temporal Analysis
Temporal analysis was also performed for comparison with temporal subtraction. For this purpose, the same features were extracted from both the recent and prior registered im age, and then subtracted, for the creation of a new temporal feature vector. The previously described methodology was optimized for the temporal feature vector and the same pro cedure was followed for the validation and evaluation, so that the results would be comparable.

RESULTS
The performance of temporal subtraction was evaluated by measuring the CR of the subtracted image and comparing it to the corresponding CR of the most recent mammogram with out any processing (Fig. 3 & 4). Overall, the CR increased As expected, the implementation of temporal subtraction increases the CR, effectively removes the background and renders the new changes more visible to the radiologists. For the first round the selected features included area, com pactness, convex area, Euler number, major and minor axis length, orientation, perimeter, shape ratio and variance. Us ing those, voting achieved 98.2% accuracy and 0.91 AUC (Table 1). Likewise, for the second round, area, compactness, convex area, equivalent diameter, extent, major and minor axis length, orientation, perimeter and solidity were selected. ANN reached 92.9% accuracy along with 0.92 AUC ( Table  2).
Temporal analysis was also performed for comparison purposes. For the elimination of FPs, the selected features included min and average intensity, entropy, skewness, smoothness, standard deviation, variance, contrast, corre lation and homogeneity. Voting achieved 83.8% accuracy and 0.79 AUC (Table 1). Moving on, for the second round the best combination of features was average and max in tensity, entropy, kurtosis, smoothness, standard deviation, contrast, correlation, energy and homogeneity. Using those, ANN reached 85.7% accuracy along with 0.82 AUC ( Table  2).
The highest classification performance was achieved us ing ANN with 92.9% accuracy. One benign mass (out of 12) was misclassified as suspicious and 3 suspicious masses (out of 44) were misclassified as benign. These results are far superior compared to those using temporal analysis. Even though ANN provided 85.7% accuracy, 4 benign and 4 suspi cious masses were misclassified. Using temporal subtraction, the average accuracy of mass classification increased by 7%, a statistically significant improvement (p < 0.05). Temporal subtraction has recently been shown to be effec tive for the detection and classification of micro-calcifications, only by this group [9]. Studies related to temporal subtraction for diagnosis of breast masses were not found in the literature, hence, direct comparison with other studies was not possible. Some groups assessed the use of temporal analysis for the detection of masses with promising results (0.88 AUC [6], 0.77 AUC [7], 0.90 AUC [8]), as also shown in this study (0.82 AUC). The results presented here, prove that temporal subtraction can achieve higher classification performance. It is important to note that in some studies, the ROIs were ran domly divided for the validation and training. In this work, leave-one-patient-out cross-validation was used, dividing the patients, and not the ROIs, to avoid any bias.

CONCLUSION
This study illustrates that the use of temporal subtraction for the classification of breast masses, results in significantly higher performance, compared to temporal analysis. The pro posed methodology was evaluated on a dataset containing 40 patients and in total, 160 images were used (two time points and two views of each breast). With temporal subtraction, the unchanging details in the images were eliminated and the CR improved by ^7x. Additionally, the classification perfor mance was increased by 7% compared to temporal analysis (92.9% vs. 85.7% accuracy) proving the effectiveness of the proposed technique. Given the small dataset, further studies must be conducted with more image pairs and different val idation methods. With further expansion and improvement, the methodology introduced in this study has the potential to substantially contribute to the development of automated CAD systems with significant impact on patient prognosis.

COMPLIANCE W ITH ETHICAL STANDARDS
The corresponding Institutional Review Board -National Bioethics Committee, approved the collection of the data and their use in this study. A ll data included in this study were collected in accordance with the ethical standards of the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.