A machine vision dataset for automated quality inspection and grading of sweetpotatoes

Zhang, Jiaming; Lu, Yuzhen; Xu, Jiajun

doi:10.5281/zenodo.18100484

Published December 30, 2025 | Version v1

Dataset Open

A machine vision dataset for automated quality inspection and grading of sweetpotatoes

1. Michigan State University

Contributors

Data collectors:

1. Michigan State University

Dataset Structure

SP_MVDataset.7z
├─Subset A
│  ├─ Sweetpotato Sampling Datasheet.docx
│  │
│  ├─ Images
│  │      Batch_01_Frame_001.png
│  │      Batch_01_Frame_002.png
│  │        ...
│  │      Batch_19_Frame_012.png
│  │
│  ├─ Labels
│  │       Batch_01_Frame_001.json
│  │       Batch_01_Frame_002.json
│  │        ...
│  │       Batch_19_Frame_012.json
│  └─ Videos
│          Batch_01.avi
│          Batch_02.avi
│           ...
│          Batch_20.avi
│
└─Subset B
    ├─ Images
    │      Batch_01_Frame_001.png
    │      Batch_01_Frame_002.png
    │       ...
    │      Batch_20_Frame_039.png
    │
    ├─ Labels
    │      Batch_01_Frame_001.json
    │      Batch_01_Frame_002.json
    │       ...
    │      Batch_20_Frame_039.json
    │
    └─ Videos
            Batch_01.avi
            Batch_02.avi
                ...
            Batch_20.avi

Dataset Organization

When extracting the data from the archive, the dataset is organized into two main components:

Subset A: Contains data from 123 commercial sweetpotatoes (Grocery Store source) imaged under ambient indoor lighting.
Subset B: Contains data from 267 fresh-harvested sweetpotatoes (Research Station source) imaged in an enclosed LED chamber.

Additionally, a supplementary file "Sweetpotato_Sampling_Datasheet.docx" is included, providing population statistics (weight, length, width) and surface conditions for the samples in Subset A. For both subset, the raw video recordings were provided for further research.

Directory Hierarchy

Within the subset directories, the data is further organized into subfolders for images and annotations:

`Subset_A/Images`: Contains 232 RGB frames (Resolution: 1920×1080 pixels).
`Subset_A/Labels`: Contains 232 corresponding JSON annotation files.
`Subset_A/Videos`: Contains 19 corresponding raw video files.

`Subset_B/Images`: Contains 1168 RGB frames (Resolution: 1280×720 pixels).
`Subset_B/Labels`: Contains 1168 corresponding JSON annotation files.
`Subset_B/Videos`: Contains 20 corresponding raw video files

Each image is a standard RGB .png file. The samples were rotated on a roller conveyor during acquisition to capture full-surface views.

Dataset Summary

- Total samples: 390 (123 in Subset A, 267 in Subset B)

- Total images: 1,400

- Total annotated instances: 3,700

- Total videos: 39

- Storage space required: Approximately 6.54 GB (uncompressed)

File Naming Convention

The file naming convention is consistent across images and annotation files to ensure traceability to the original video batches. Each file name includes two key elements:

1. Batch ID: Represents the specific group or video sequence (e.g., "Batch_1").

2. Frame Sequence: Represents the sequential order of the frame extracted (e.g., "Frame_001").

Examples

- Batch_01_Frame_001.png: The 1st frame extracted from Batch 1 video.

- Batch_01_Frame_001.json: The corresponding annotation file for the image above.

Note: Files in Subset A and Subset B share this naming convention but are stored in separate parent directories to distinguish the domain/source.

Annotation Structure

The annotation files (`.json`) follow the standard LabelMe format. They are fully compatible with common computer vision tools (e.g., LabelMe, AnyLabeling). Each file contains:

1. Shapes:

- label: The visual quality category (Grade 1, Grade 2, or Grade 3).

- points: A list of [x, y] coordinates defining the polygon mask around the sweetpotato instance.

- shape_type: "polygon".

2. Image Path: References the corresponding .png image file.

Class Definitions

Sweetpotato instances are labeled into three visual categories based on the visible surface defects in the specific frame:

- Grade 1 (Normal): High-quality instances. Defect-free or negligible surface imperfections visible.

- Grade 2 (Moderate Defects): Instances with visible surface defects that affect appearance (e.g., minor scuffs, skinning).

- Grade 3 (Severe Defects): Instances with significant damage or decay visible (e.g., deep cuts, rot, severe mechanical damage).

Usage Notes

- Data Splitting: Users should split the dataset (Training/Testing) based on the Batch ID (Video level), not by individual frames. Randomly splitting frames will result in data leakage as adjacent frames capture the same samples.

- ID Mapping: The dataset is designed for instance-level vision tasks (segmentation/detection). Instance-to-Sample ID mapping (tracking specific physical roots across frames) is not explicitly provided in the metadata.

Citations

If you use the dataset in published research, please consider citing related journal articles or the dataset.

Xu, J., Lu, Y., & Deng, B. (2024). Design, prototyping, and evaluation of a new machine vision-based automated sweetpotato grading and sorting system, Journal of the ASABE 67 (2024) 1369–1380. https://doi.org/10.13031/ja.16051
Xu, J. & Lu, Y. (2024). Prototyping and evaluation of a novel machine vision system for real-time, automated quality grading of sweetpotatoes, Computers and Electronics in Agriculture 219 (2024) 108826. https://doi.org/10.1016/j.compag.2024.108826
Zhang, J., Lu, Y., & Xu, J. (2025). A machine vision dataset for automated quality inspection and grading of sweetpotatoes [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18100484

Hopefully, you find the dataset useful.

Files

README.md

Files (6.8 GB)

Name	Size	Download all
README.md md5:3adc91eaff9ce5648c9c2c4bea888252	5.9 kB	Preview Download
SP_MVDataset.7z md5:1e25f38e293b9b08c0f66b26aaabe960	6.8 GB	Download

Additional details

Is published in: Publication: 10.13031/ja.16051 (DOI); Publication: 10.1016/j.compag.2024.108826 (DOI)

United States Department of Agriculture
Agricultural Marketing Service Specialty Crop Multi-State Program AM21SCMPMS1010
United States Department of Agriculture
Specialty Crop Block Grant Program G00006341

Collected: 2023

Samples collected.
Created: 2023-12-21

Videos captured.

Repository URL: https://github.com/AgFood-Sensing-and-Intelligence-Lab/Sweetpotato-Grading-Dataset-RGB

	All versions	This version
Views	35	35
Downloads	5	5
Data volume	6.8 GB	6.8 GB

Contributors

Data collectors:

Dataset Structure

Dataset Organization

Directory Hierarchy

Dataset Summary

File Naming Convention

Examples

Annotation Structure

Class Definitions

Usage Notes

Citations

README.md

Files (6.8 GB)

Related works

Funding

Dates

Software

A machine vision dataset for automated quality inspection and grading of sweetpotatoes

Authors/Creators

Contributors

Data collectors:

Description

Dataset Structure

Dataset Organization

Directory Hierarchy

Dataset Summary

File Naming Convention

Examples

Annotation Structure

Class Definitions

Usage Notes

Citations

Files

README.md

Files (6.8 GB)

Additional details

Related works

Funding

Dates

Software