Published May 28, 2025 | Version v2
Dataset Restricted

Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease Detection

Description

This is a comprehensive version of the Eggplant Leaf Image Dataset, designed to support machine learning and deep learning research in agriculture, plant pathology, and computer vision. This dataset addresses class imbalance and model generalization challenges by including a significantly expanded collection of images through controlled data augmentation.

The dataset includes a total of 2,180 high-resolution images (6000×4000 pixels), categorized into six disease or health classes of Solanum melongena (eggplant) leaves:

Class Original Images Augmented Images Total Images
Healthy 80 320 400
Insect-Pest 40 320 360
Leaf-Spot 50 300 350
Mosaic-Virus 15 345 360
Small-Leaf 20 340 360
Wilt 50 300 350

All original images were captured using a Canon EOS 1300D DSLR camera under consistent natural lighting conditions. Files are saved in JPG format, and image resolution is preserved within ±5% of the original dimensions to maintain visual fidelity.

To improve dataset usability for robust model training and generalization, controlled data augmentation was applied using the Albumentations library. The transformations include random rotation, horizontal flipping, brightness/contrast adjustments, slight color shifts, and padding to maintain aspect ratio. All augmentation procedures were consistently applied and seeded for reproducibility. Augmentation parameters are documented in detail in the metadata.

The metadata.csv file provides a class-wise summary including original image count, augmented image count, augmentation ratios, and the exact augmentation pipeline used. The augmentation was seeded for reproducibility.

Note: The original and augmented images are stored in separate folders under the "Original" and "Augmented" directories, respectively. Each directory is organized into six class-specific subfolders: Healthy, Insect-Pest, Leaf-Spot, Mosaic-Virus, Small-Leaf, and Wilt. Augmented images are clearly distinguishable by the inclusion of the substring "_aug_" in their filenames. This clear separation ensures reproducibility, transparency in data provenance, and ease of use for researchers who may wish to train models using only original, only augmented, or both types of data.

Files:

  • EggplantLeaf-ImageDataset.zip — Contains all files and folders, inclusind Original, Augmented, metadata and readme.
  • OriginalC — Contains only raw field-captured images grouped by class.
  • Augmented — Contains synthetically expanded datasets, also organized by class. Augmented filenames include the marker "aug" for easy identification.
  • metadata.csv — Class-level summary and augmentation details.
  • Readme.md — Technical documentation and usage notes.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Dates

Collected
2025-04-12