Breast Mass Detection and Classification based on Digital Temporal Subtraction of Mammogram Pairs

Breast cancer is one of the deadliest malignancies worldwide. In mammography, the most reliable screening tool for its diagnosis, expert radiologists review the mammograms to determine whether the patient has any signs of disease. Unfortunately, the evaluation of breast abnormalities is challenging, even for experienced radiologists. Computer-Aided Detection (CAD) systems can assist in the detection of breast cancer. In this work, an algorithm for the automatic detection and classification of masses, based on subtraction of sequential digital mammograms, image registration and machine learning, is presented. Previous studies assessed the use of sequential mammograms to perform temporal analysis by creating a new temporal feature vector. Temporal subtraction registers and subtracts the prior mammogram from the current one, prior to performing mass detection and classification. A new dataset, which includes sequential pairs from 40 patients (160 mammograms) with precisely annotated mass locations (benign and suspicious), was created to assess the performance of the algorithm. For the classification, various features were extracted and six classifiers were used in a leave-one-patient-out cross-validation. The accuracy of the classification of masses as benign or suspicious increased from 90.83% (with the previously described temporal analysis) to 96.51% (with temporal subtraction). The improvement was statistically significant with p < 0.05. These results demonstrate the effectiveness of the proposed technique of temporal subtraction of mammograms for the detection of masses.


I. INTRODUCTION
Breast cancer is the most frequently diagnosed cancer and the leading cause of mortality in women worldwide. Mammography is extensively used as a screening and diagnostic tool [1]. Radiologists evaluate the mammograms manually to determine whether the patient has signs of malignancy. In the case of positive findings, the appropriate disease management is followed. According to the Breast Imaging Reporting and Data System (BI-RADS), an important sign of breast cancer that radiologists look for, is a mass [2]. A breast mass is associated with a localized swelling or lump inside the breast and typically appears as a relatively dense region. Breast masses are defined by their size, shape, density and texture and their characteristics (i.e. contrast, distortion) are important in their assessment. Radiologists examine the mass properties in order to classify them as benign or suspicious. This categorization is one of the critical task in mammography since it determines subsequent management. It is also challenging, not only due to the large variation in size and shape of masses, but also due to their low image contrast [3]. Computer Aided Diagnosis (CAD) systems are being developed to address the issue of the diagnosis of breast masses. Although various algorithms have been proposed [1], [3], [4], there is still room for improvement, especially considering that few existing methods compare the recent with the older mammograms, something that radiologists routinely exploit for more accurate identification of abnormalities.
Only a few studies in the literature assessed the use of sequential mammograms for the detection of breast masses using temporal analysis [5]- [7]. Regions of Interest (ROIs) in the recent image were associated with corresponding ROIs in the prior one. A new temporal feature vector was created by subtracting the numerical values of the features, extracted from each prior image ROI, from the corresponding features of the recent image ROI. This vector was then used for the classification of the ROIs. Overall, the findings confirmed that subtle signs of malignancy could be recognized only with the addition of information from prior mammograms. Moreover, the classification performance increased, compared to using only the features from the recent mammograms. However, only ROIs were included and temporal information from other regions was not available.
In this work, a new approach for the automatic detection and classification of masses, based on subtraction of sequential digital mammograms, is presented. Temporal subtraction, previously demonstrated for micro-calcification detection, unlike temporal analysis, subtracts the entire prior image [8]. With the application of temporal subtraction, unchanged overlapping regions were effectively removed resulting in an increase in the Contrast Ratio (CR). Subsequently, falsely identified regions were eliminated and the remaining masses were classified as benign or suspicious. The results were compared to temporal analysis and proved the proposed technique to be more effective, with a superior classification performance.

A. Dataset
In this study, a new dataset of mammographic data was created since the available open access databases of breast cancer do not include sequential sets of mammograms, but rather only the recent ones. In addition, the images  are not precisely annotated to show the exact margins of a mass, limiting their value as ground truth. The new clinical mammogram dataset developed consists of 40 pairs of full-field digital sequential mammograms from CranioCaudal (CC, above) and MedioLateral Oblique (MLO, side-angle) views. A total of 160 images were taken from various hospitals across the country. The selection and annotation was made by a radiologist with ten years of experience and a second radiologist, with two years of experience, validated the images. Consecutive mammograms were collected from 2012 to 2019, with an average of 1 to 7 years interval. The age of the participants varied from 39 to 74, with a mean age of 56.48 ± 8.96 years and median age of 56.50. More details about the age of participants included in the study can be found in Table I. Half the cases corresponded to healthy individuals without any signs of malignancy in both screening rounds. The remaining cases belonged to patients with at least one suspicious mass present in the most recent mammogram. This unique dataset, includes detailed annotations of the boundary of each mass (both benign and suspicious) to be used as the ground truth (Fig. 1). The dimensions of the mammograms were 4096×3328 pixels, in a 12-bit DICOM format.

B. Mass Detection and Classification using Temporal Subtraction
In temporal subtraction the two corresponding images of each pair were processed in parallel. Fig. 2 demonstrates the methodology followed. The pre-processing began with the normalization of the images, to adjust the range of pixel intensity values and then Contrast Limited Adaptive Histogram Equalization (CLAHE), Gamma correction and border removal took place. CLAHE is based on the re-allocation of the gray levels inside an image using the probability distribution of the input gray levels. With this technique, the maximum contrast enhancement factor is affected by the clip level that is defined by the user. Hence, CLAHE effectively eliminated the noise and diminished the edge-shadowing effect [9]. Contrast adjustment using Gamma correction was used to account for the non-linear mapping of intensities in the images such as: where, l max is the maximum intensity value of the image and γ the Gamma parameter, which was set to 2 [10]. ]. Border removal removed the high intensity areas connected to the border (including the pectoral muscle) using: where, I is the original image and F the border areas [11]. For effective image subtraction, precise image registration is required. Registration compensates for patient movement, shape alterations, differences in the amount of the pectoral muscle in the MLO view, variations in breast compression and human error. Various registration methods are available for the alignment of prior and recent mammograms [12]. In this study, two registration algorithms were assessed: Affine [12] and Demons [13]. The two techniques were compared based on the residuals (sum of the remaining pixels after subtraction) and in all times, the prior image was registered to the recent one.
Affine is an intensity-based linear-mapping registration technique that consists scaling, translation and rotation. Affine secures planes, straight lines and points and it can be explained by: where, t x and t y are the translation of vectors, x and y are the original points, x and y are the transformed points and a 1 , a 2 , b 1 , b 2 are the affine parameters [12]. On the contrary, Demons is a local registration technique that aligns the moving image (pior) to the fixed one (recent), using regional similarity and location. In Demons, the registration is seen as a diffusion process and it can be represented as the energy function with respect to the update field u, using fixed image F , moving image M and a transformation field s as follows: where, σ 2 i is the noise on the image intensity and σ 2 x the spatial uncertainty. Using Taylor expansion, (4) can be linearized and the energy function will reach its minimum when gradient descent is zero. The registration must be solved iteratively, since the update field is based on local information [13].
Registration was followed by temporal subtraction, subtracting the prior registered image from the recent one, to eliminate all the areas that remained unchanged. At this point, the CR of the subtracted image was compared to the corresponding CR of the most recent image, to evaluate the image enhancement of the method. High intensity areas on the periphery of the breast were automatically removed by detecting the breast periphery from a binary image of the entire breast, and subsequently by removing all the high intensity regions that fall on the periphery. This approach was critical since the regions that appeared in the periphery corresponded to skin areas that cannot contain masses, and emerged from misalignment.
Afterwards, the subtracted image was processed using unsharp-mask filtering, a spatial filter that enhances a specific range of frequencies inside an image. The form of unsharp masking can be expressed as: where, k is the scaling constant (can be set from 0.2 to 0.7) and g(x, y) is calculated as follows: where, f (x, y) is the input image and f smooth (x, y) is the smoothed version of the input [14]. The first step in the identification of the margin of the breast masses was the conversion to binary using thresholding. To select the most suitable threshold value, Otsu's method was applied using the discriminant criterion [15]. Morphological operations were then applied, such as opening and closing. At this point, the detection was over and the remaining ROIs were considered as possible masses.
Machine learning was performed to reduce the large number of False Positive (FP) detections. In total, 28 features were extracted from four different categories: shape, intensity-based, First-Order Statistics (FOS) and Gray Level Co-occurrence Matrix features (GLCM). Details about the extracted features can be found in Table II. Those features were selected for their capability to identify the masses based on their unique characteristics. Feature selection was then applied for the elimination of the least important features, in order to increase the classification performance. The most statistically significant features (p < 0.01) were selected and the optimal combination was determined during the classification process, by optimizing the accuracy of the results during training.
Six classifiers were assessed: Linear Discriminant Analysis (LDA) [16], k-Nearest Neighbors (k-NN) [17], Naive Bayes (NB) [17], Support Vector Machines (SVMs) [18], Decision Trees (DTs) [19] and Ensemble of Decision Trees (EDT) [20], with leave-one-patient-out cross-validation. Each time the four images of a single patient (CC and MLO views of the most recent and prior mammograms) were used as the test sample, while the remaining patients were used for the training of the model, until all the cases were classified. With this approach, bias is avoided since all the ROIs of a single patient were used either in the training or in the testing, but not both. In the first classification round, the detected ROIs were classified as masses or normal tissue and in the second round, the true masses were characterized as benign or suspicious. However, since the dataset was limited, 3-class classification was also performed and the ROIs were automatically classified into normal tissue, benign masses and suspicious masses, in one round. The performance of classification was evaluated by measuring sensitivity, specificity, accuracy and the Area Under the Receiver Operating Characteristics (ROC) Curve (AUC).

C. Mass Detection and Classification using Temporal Analysis
Temporal analysis was proposed in the literature as an effective technique for the detection of breast masses using sequential mammograms. Using this technique, the prior mammogram is used as a reference for the creation of a new feature vector, by subtracting the numerical values of the features extracted at the prior image from the corresponding ones at the recent image. This approach was applied to compare its efficiency to the proposed temporal subtraction. Hence, the same features were extracted from both the recent and prior registered image, and then subtracted, to create a new temporal feature vector. The previously described classification x Energy Homogeneity x x methodology was optimized for the temporal analysis method and the same procedure was followed for the validation and evaluation so that the results would be comparable.

A. Mass Detection and Classification using Temporal Subtraction
Of the two registration methods evaluated, Demons registration had superior performance as compared to Affine, successfully tracking the image changes that occurred between the two screenings. As seen in Fig. 3, the residuals of the subtraction confirmed that the Demons approach effectively eliminated the unchanged regions in dense and fatty breasts (42% vs. 28.5% overall residual). The performance of  temporal subtraction was also evaluated by measuring the CR of the subtracted image and comparing it to the corresponding CR of the most recent mammogram without any processing ( Fig. 4 & 5). Overall, the CR increased ∼7 times (11.29 vs. 83.1), for dense and fatty mammograms, resulting in a new image accentuating the changes between the screenings.
Features were extracted from the images and, using the feature selection algorithms, the best features for the elimination of FPs were determined (Table II). Using those, EDT achieved 97.44% accuracy and 0.92 AUC (Fig. 6). The true masses were then further categorized as benign or suspicious and EDT reached 83.93% accuracy along with 0.74 AUC (Fig. 6). In the 3-class classification (normal -benignsuspicious), the performance, using EDT and the same features  as before, reached a high of 96.51% accuracy and 0.91 AUC (Fig. 6).

B. Mass Detection and Classification using Temporal Analysis
Temporal analysis was also performed for comparison purposes. For the elimination of FPs, the same features as temporal subtraction were used. EDT achieved the highest performance with 87.04% accuracy and 0.85 AUC (Fig. 7). For the classification of masses as benign or suspicious, an EDT model reached 50% accuracy and 0.44 AUC (Fig. 7), using again the same features as temporal subtraction. In the 3-class classification, 90.83% accuracy and 0.87 AUC (Fig. 7) were achieved with the implementation of EDT and, again, the same selected features.

IV. DISCUSSION
As expected, the implementation of temporal subtraction increases the CR, effectively removes the background and renders the new changes increasingly visible to the radiologists. The highest classification performance was achieved using EDT in a 3-class model (96.51%) and there were 2.1 FP detections per image. These results are far superior compared to those using temporal analysis (Fig. 8). Even though EDT provided 90.83% accuracy in the 3-class classification, 5 benign masses (out of 12) and 5 suspicious masses (out of 44) were misclassified as benign. In total, there were 5.6 FP detections per image. Using temporal subtraction, the average accuracy of mass classification increased by 6%, a statistically significant improvement (p-value < 0.05).
Temporal subtraction has recently been shown to be effective for the detection and classification of micro-calcifications [8]. Studies related to temporal subtraction for masses were not found in the literature, hence, direct comparison with other studies was not possible. Some groups assessed the use of temporal analysis for the detection of masses with promising results (0.88 AUC [5], 0.77 AUC [6], 0.90 AUC [7]), as also shown in this study (0.91 AUC). The results presented in this paper, prove that temporal subtraction can achieve higher performance and can improve the diagnosis of breast masses.
It is important to note that, in some studies, the ROIs were randomly divided for the validation and training. Parts of the same image, thus from the same patient, were used in both training and testing of the classification model. In this work, leave-one-patient-out cross-validation was used, dividing the patients and not the ROIs into testing and training, to avoid any bias.

V. CONCLUSION
In this work, a new dataset consisting of 40 pairs of digital sequential mammograms was created, to evaluate the breast mass classification performance of temporal subtraction, compared to temporal analysis, in combination with image registration and machine learning. With temporal subtraction, the unchanging details in the mammogram, e.g. the background, were eliminated and the CR improved by ∼7x. The classification performance increased by 6% compared to temporal analysis, achieving 96.51% vs. 90.83% accuracy. Moreover, with the implementation of temporal subtraction, the average number of FP detections per mammogram decreased from 5.6 to 2.1. However, these results are preliminary, given the small dataset considered. As a result, further studies must be conducted to include more cases and different validation methods. The further expansion and improvement of the findings of this study have the potential to substantially contribute to the development of automated CAD systems with significant impact on patient prognosis.