Published December 30, 2025
| Version v1
Dataset
Open
A machine vision dataset for automated quality inspection and grading of sweetpotatoes
Authors/Creators
Description
Dataset Structure
SP_MVDataset.7z
├─Subset A
│ ├─ Sweetpotato Sampling Datasheet.docx
│ │
│ ├─ Images
│ │ Batch_01_Frame_001.png
│ │ Batch_01_Frame_002.png
│ │ ...
│ │ Batch_19_Frame_012.png
│ │
│ ├─ Labels
│ │ Batch_01_Frame_001.json
│ │ Batch_01_Frame_002.json
│ │ ...
│ │ Batch_19_Frame_012.json
│ └─ Videos
│ Batch_01.avi
│ Batch_02.avi
│ ...
│ Batch_20.avi
│
└─Subset B
├─ Images
│ Batch_01_Frame_001.png
│ Batch_01_Frame_002.png
│ ...
│ Batch_20_Frame_039.png
│
├─ Labels
│ Batch_01_Frame_001.json
│ Batch_01_Frame_002.json
│ ...
│ Batch_20_Frame_039.json
│
└─ Videos
Batch_01.avi
Batch_02.avi
...
Batch_20.avi
Dataset Organization
When extracting the data from the archive, the dataset is organized into two main components:
- Subset A: Contains data from 123 commercial sweetpotatoes (Grocery Store source) imaged under ambient indoor lighting.
- Subset B: Contains data from 267 fresh-harvested sweetpotatoes (Research Station source) imaged in an enclosed LED chamber.
Additionally, a supplementary file "Sweetpotato_Sampling_Datasheet.docx" is included, providing population statistics (weight, length, width) and surface conditions for the samples in Subset A. For both subset, the raw video recordings were provided for further research.
Directory Hierarchy
Within the subset directories, the data is further organized into subfolders for images and annotations:
- `Subset_A/Images`: Contains 232 RGB frames (Resolution: 1920×1080 pixels).
- `Subset_A/Labels`: Contains 232 corresponding JSON annotation files.
- `Subset_A/Videos`: Contains 19 corresponding raw video files.
- `Subset_B/Images`: Contains 1168 RGB frames (Resolution: 1280×720 pixels).
- `Subset_B/Labels`: Contains 1168 corresponding JSON annotation files.
- `Subset_B/Videos`: Contains 20 corresponding raw video files
Each image is a standard RGB .png file. The samples were rotated on a roller conveyor during acquisition to capture full-surface views.
Dataset Summary
- Total samples: 390 (123 in Subset A, 267 in Subset B)
- Total images: 1,400
- Total annotated instances: 3,700
- Total videos: 39
- Storage space required: Approximately 6.54 GB (uncompressed)
File Naming Convention
The file naming convention is consistent across images and annotation files to ensure traceability to the original video batches. Each file name includes two key elements:
1. Batch ID: Represents the specific group or video sequence (e.g., "Batch_1").
2. Frame Sequence: Represents the sequential order of the frame extracted (e.g., "Frame_001").
Examples
- Batch_01_Frame_001.png: The 1st frame extracted from Batch 1 video.
- Batch_01_Frame_001.json: The corresponding annotation file for the image above.
Note: Files in Subset A and Subset B share this naming convention but are stored in separate parent directories to distinguish the domain/source.
Annotation Structure
The annotation files (`.json`) follow the standard LabelMe format. They are fully compatible with common computer vision tools (e.g., LabelMe, AnyLabeling). Each file contains:
1. Shapes:
- label: The visual quality category (Grade 1, Grade 2, or Grade 3).
- points: A list of [x, y] coordinates defining the polygon mask around the sweetpotato instance.
- shape_type: "polygon".
2. Image Path: References the corresponding .png image file.
Class Definitions
Sweetpotato instances are labeled into three visual categories based on the visible surface defects in the specific frame:
- Grade 1 (Normal): High-quality instances. Defect-free or negligible surface imperfections visible.
- Grade 2 (Moderate Defects): Instances with visible surface defects that affect appearance (e.g., minor scuffs, skinning).
- Grade 3 (Severe Defects): Instances with significant damage or decay visible (e.g., deep cuts, rot, severe mechanical damage).
Usage Notes
- Data Splitting: Users should split the dataset (Training/Testing) based on the Batch ID (Video level), not by individual frames. Randomly splitting frames will result in data leakage as adjacent frames capture the same samples.
- ID Mapping: The dataset is designed for instance-level vision tasks (segmentation/detection). Instance-to-Sample ID mapping (tracking specific physical roots across frames) is not explicitly provided in the metadata.
Citations
If you use the dataset in published research, please consider citing related journal articles or the dataset.
- Xu, J., Lu, Y., & Deng, B. (2024). Design, prototyping, and evaluation of a new machine vision-based automated sweetpotato grading and sorting system, Journal of the ASABE 67 (2024) 1369–1380. https://doi.org/10.13031/ja.16051
- Xu, J. & Lu, Y. (2024). Prototyping and evaluation of a novel machine vision system for real-time, automated quality grading of sweetpotatoes, Computers and Electronics in Agriculture 219 (2024) 108826. https://doi.org/10.1016/j.compag.2024.108826
- Zhang, J., Lu, Y., & Xu, J. (2025). A machine vision dataset for automated quality inspection and grading of sweetpotatoes [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18100484
Hopefully, you find the dataset useful.
Files
README.md
Files
(6.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:3adc91eaff9ce5648c9c2c4bea888252
|
5.9 kB | Preview Download |
|
md5:1e25f38e293b9b08c0f66b26aaabe960
|
6.8 GB | Download |
Additional details
Related works
- Is published in
- Publication: 10.13031/ja.16051 (DOI)
- Publication: 10.1016/j.compag.2024.108826 (DOI)
Funding
- United States Department of Agriculture
- Agricultural Marketing Service Specialty Crop Multi-State Program AM21SCMPMS1010
- United States Department of Agriculture
- Specialty Crop Block Grant Program G00006341
Dates
- Collected
-
2023Samples collected.
- Created
-
2023-12-21Videos captured.