Published December 30, 2025 | Version v1
Dataset Open

A machine vision dataset for automated quality inspection and grading of sweetpotatoes

  • 1. ROR icon Michigan State University

Contributors

Data collectors:

  • 1. ROR icon Michigan State University

Description

Dataset Structure

SP_MVDataset.7z
├─Subset A
│  ├─ Sweetpotato Sampling Datasheet.docx
│  │
│  ├─ Images
│  │      Batch_01_Frame_001.png
│  │      Batch_01_Frame_002.png
│  │        ...
│  │      Batch_19_Frame_012.png
│  │
│  ├─ Labels
│  │       Batch_01_Frame_001.json
│  │       Batch_01_Frame_002.json
│  │        ...
│  │       Batch_19_Frame_012.json
│  └─ Videos
│          Batch_01.avi
│          Batch_02.avi
│           ...
│          Batch_20.avi
│
└─Subset B
    ├─ Images
    │      Batch_01_Frame_001.png
    │      Batch_01_Frame_002.png
    │       ...
    │      Batch_20_Frame_039.png
    │
    ├─ Labels
    │      Batch_01_Frame_001.json
    │      Batch_01_Frame_002.json
    │       ...
    │      Batch_20_Frame_039.json
    │
    └─ Videos
            Batch_01.avi
            Batch_02.avi
                ...
            Batch_20.avi

Dataset Organization

    When extracting the data from the archive, the dataset is organized into two main components:
  1. Subset A: Contains data from 123 commercial sweetpotatoes (Grocery Store source) imaged under ambient indoor lighting.
  2. Subset B: Contains data from 267 fresh-harvested sweetpotatoes (Research Station source) imaged in an enclosed LED chamber.
    Additionally, a supplementary file "Sweetpotato_Sampling_Datasheet.docx" is included, providing population statistics (weight, length, width) and surface conditions for the samples in Subset A. For both subset, the raw video recordings were provided for further research.

Directory Hierarchy

    Within the subset directories, the data is further organized into subfolders for images and annotations:
  • `Subset_A/Images`: Contains 232 RGB frames (Resolution: 1920×1080 pixels).
  • `Subset_A/Labels`: Contains 232 corresponding JSON annotation files.
  • `Subset_A/Videos`: Contains 19 corresponding raw video files.

 

  • `Subset_B/Images`: Contains 1168 RGB frames (Resolution: 1280×720 pixels).
  • `Subset_B/Labels`: Contains 1168 corresponding JSON annotation files.
  • `Subset_B/Videos`: Contains 20 corresponding raw video files
    Each image is a standard RGB .png file. The samples were rotated on a roller conveyor during acquisition to capture full-surface views.

Dataset Summary

    - Total samples: 390 (123 in Subset A, 267 in Subset B)
    - Total images: 1,400
    - Total annotated instances: 3,700
    - Total videos: 39
    - Storage space required: Approximately 6.54 GB (uncompressed)

File Naming Convention

    The file naming convention is consistent across images and annotation files to ensure traceability to the original video batches. Each file name includes two key elements:
    1. Batch ID: Represents the specific group or video sequence (e.g., "Batch_1").
    2. Frame Sequence: Represents the sequential order of the frame extracted (e.g., "Frame_001").

Examples

    - Batch_01_Frame_001.png: The 1st frame extracted from Batch 1 video.
    - Batch_01_Frame_001.json: The corresponding annotation file for the image above.
 
    Note: Files in Subset A and Subset B share this naming convention but are stored in separate parent directories to distinguish the domain/source.

Annotation Structure

    The annotation files (`.json`) follow the standard LabelMe format. They are fully compatible with common computer vision tools (e.g., LabelMe, AnyLabeling). Each file contains:
    1. Shapes:
        - label: The visual quality category (Grade 1, Grade 2, or Grade 3).
        - points: A list of [x, y] coordinates defining the polygon mask around the sweetpotato instance.
        - shape_type: "polygon".
    2. Image Path: References the corresponding .png image file.

Class Definitions

    Sweetpotato instances are labeled into three visual categories based on the visible surface defects in the specific frame:
    - Grade 1 (Normal): High-quality instances. Defect-free or negligible surface imperfections visible.
    - Grade 2 (Moderate Defects): Instances with visible surface defects that affect appearance (e.g., minor scuffs, skinning).
    - Grade 3 (Severe Defects): Instances with significant damage or decay visible (e.g., deep cuts, rot, severe mechanical damage).

Usage Notes

    - Data Splitting: Users should split the dataset (Training/Testing) based on the Batch ID (Video level), not by individual frames. Randomly splitting frames will result in data leakage as adjacent frames capture the same samples.
    - ID Mapping: The dataset is designed for instance-level vision tasks (segmentation/detection). Instance-to-Sample ID mapping (tracking specific physical roots across frames) is not explicitly provided in the metadata.
 

Citations

If you use the dataset in published research, please consider citing related journal articles or the dataset.

  1. Xu, J., Lu, Y., & Deng, B. (2024). Design, prototyping, and evaluation of a new machine vision-based automated sweetpotato grading and sorting system, Journal of the ASABE 67 (2024) 1369–1380. https://doi.org/10.13031/ja.16051 
  2. Xu, J. & Lu, Y. (2024). Prototyping and evaluation of a novel machine vision system for real-time, automated quality grading of sweetpotatoes, Computers and Electronics in Agriculture 219 (2024) 108826. https://doi.org/10.1016/j.compag.2024.108826
  3. Zhang, J., Lu, Y., & Xu, J. (2025). A machine vision dataset for automated quality inspection and grading of sweetpotatoes [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18100484

Hopefully, you find the dataset useful.

Files

README.md

Files (6.8 GB)

Name Size Download all
md5:3adc91eaff9ce5648c9c2c4bea888252
5.9 kB Preview Download
md5:1e25f38e293b9b08c0f66b26aaabe960
6.8 GB Download

Additional details

Related works

Is published in
Publication: 10.13031/ja.16051 (DOI)
Publication: 10.1016/j.compag.2024.108826 (DOI)

Funding

United States Department of Agriculture
Agricultural Marketing Service Specialty Crop Multi-State Program AM21SCMPMS1010
United States Department of Agriculture
Specialty Crop Block Grant Program G00006341

Dates

Collected
2023
Samples collected.
Created
2023-12-21
Videos captured.