Published January 30, 2025 | Version v1
Dataset Open

Breast cancer dataset

  • 1. Universiti Kebangsan Malaysia

Description

The dataset used in this study consists of 7,632 mammogram images categorized into two classes: 2,520 benign and 5,112 malignant images from Huang and Lin (2020). The mammography images in the INbreast database were originally collected from the Centro Hospitalar de S. Joao (CHSJ) Breast Center in Porto. The database contains data collected from August 2008 to July 2010 and includes 115 cases with a total of 410 images (Moreira et al., 2012). Of these, 90 cases concern women with abnormalities in both breasts. Four different types of breast disease are recorded in the database: Mass, calcification, asymmetries and distortions. The mammograms are recorded from two standard perspectives: Craniocaudal (CC) and Mediolateral Oblique (MLO). In addition, breast density is classified into four categories based on the BI-RADS standards: Fully Fat (Density 1), Scattered Fibrous-Landular Density (Density 2), Heterogeneously Dense (Density 3) and Extremely Dense (Density 4). The images are stored in two resolutions: 3328 x 4084 pixels or 2560 x 3328 pixels, in DICOM format. 106 mammograms depicting breast masses were selected from the INbreast database. To enhance the dataset for model training, data augmentation techniques were applied, increasing the total number of breast mammography images to 7,632.

Files

BreastCancer_Benign.zip

Files (76.7 MB)

Name Size Download all
md5:434f0489c24028630435a86db9d810ef
24.9 MB Preview Download
md5:31abc76a6902bf1ced90b4f2b5effcf3
51.8 MB Preview Download