A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes

Zhang, Jiaming; Lu, Yuzhen; Kong, Zhikun; Xu, Jiajun

doi:10.5281/zenodo.18378019

Published January 26, 2026 | Version v1

Dataset Open

A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes

1. Michigan State University

Dataset Structure
```
SP_3D_Dataset.zip
├─ raw_PointClouds
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.laz
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
├─ segmented_PointClouds
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.laz
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
├─ selected_Images
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.png
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
└─ sweetpotato_volume_ground-truth.xlsx
```
---

Dataset Organization
When extracting the archive, the dataset is organized into three primary functional folders and one ground-truth file:

**raw_PointClouds**: Contains raw 3D point cloud data (.laz format) directly acquired from the LiDAR sensor.
**segmented_PointClouds**: Contains cleaned point cloud data after background removal and statistical denoising.
**selected_Images**: Contains synchronized 2D RGB images (.png).
**sweetpotato_volume_ground-truth.xlsx**: A spreadsheet containing physical reference measurements for all 200 samples.

Directory Hierarchy
The dataset follows a consistent hierarchical structure across all directories:

Each primary folder contains 200 subfolders (labeled `SampleID_001` to `SampleID_200`), corresponding to specific sweetpotato samples.
Image Resolution: RGB and Depth maps are 1280 × 720 pixels.
Temporal Consistency: The original frame indices from the raw recording were preserved to allow for multi-view fusion and tracking research.

Dataset Summary

Total Samples: 200 "Beauregard" sweetpotatoes.
Imaging System: Custom LiDAR-based roller conveyor (Intel RealSense L515).
Reference Method: Standard water displacement method (average of two replicates).
Storage Space: Approximately 15.5 GB (uncompressed).

Camera & Imaging Specifications
As requested by the system configuration, the Intel RealSense L515 LiDAR was operated with the following settings to ensure data consistency:

| Parameter | Configuration / Value |
|-----------------------|-----------------------------------------------------------|
| **Sensor Model** | Intel RealSense™ L515 LiDAR |
| **RGB Resolution** | 1280 × 720 pixels (.png) |
| **Depth Resolution** | 1280 × 720 pixels (.ply) |
| **Frame Rate** | 30 FPS |
| **Laser Wavelength** | 860 nm |
| **Mounting Height** | 0.43 m above the conveyor |
| **Imaging Lighting** | Ambient indoor light (no controlled lighting) |
| **Conveyor Speed** | 10 mm/s|
| **Exposure Time** | 1250 $\mu s$|
| **Gain** | 10|
| **Brightness** | 1|
| **Contrast** | 50|
| **Backlight Compensation** | 98|
| **Saturation** | 50|
| **Sharpness** | 80|
| **White Balance** | 4600 K|

File Naming Convention
The naming convention ensures traceability and temporal alignment:
`SampleID_[ID]_FrameID_[ID].[ext]`
1. **SampleID**: Unique identifier for each physical sweetpotato root.
2. **FrameID**: Sequential order of the frame extracted from the continuous recording.

Example: `SampleID_001_FrameID_015.png` is the 15th frame of the 1st sample.

Data Processing & Feature Extraction

Segmentation: Binary masks were generated using HSV thresholding ($T_{min}=[7,40,120]$, $T_{max}=[20,165,219]$), flood-filling, and morphological opening.
Denoising: Statistical Outlier Removal (SOR) was applied using 50 nearest neighbors and a 0.02 standard deviation ratio.
Potential Usage: The high-density point clouds support the extraction of 2D features (area, perimeter, radial distance) and 3D features (projected volume, surface area) as described in the related research.

Demo

A minimal workflow is provided for readers to validate the data; see the `Demo` folder for details. In the (unzipped) demo folder, the spreadsheet "Ft.csv" contains a feature set for 200 samples, each comprising 6 extracted frames and 8 features per RGB-D frame, that is, a 1200 x 8 feature matrix, while the spreadsheet "y.csv" contains the corresponding volume ground-truth. The feature and ground-truth data are also saved in "Ft.mat" and "y.mat" Matlab files, respectively, in the "Demo" folder. To demonstrate the volume prediction, features for samples 1-134 were used for model development and samples 135-200 for testing, as described in Xu et al. (2024). The Python script `data_loader.py` is included for Synchronized 2D/3D dataset traversing, and the "SW_VOL_PRED_MLR_PCA.py" script is used for volume modeling and prediction using simple methods, i.e., multiple linear regression (MLR) and principal component regression (principal component analysis + MLR). The models achieved accuracy comparable to that reported in the related paper (Xu et al., 2024). The model results are reported in Zhang et al. (2026b).

Citations
If you use this dataset in your research, please cite the following:

**Related Research Article:**
Xu, J., Lu, Y., Olaniyi, E., & Harvey, L. (2024). Online volume measurement of sweetpotatoes by a LiDAR-based machine vision system. *Journal of Food Engineering*, 361, 111725. https://doi.org/10.1016/j.jfoodeng.2023.111725

**Dataset:**
Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026a). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes [Data set]. *Zenodo*. https://doi.org/10.5281/zenodo.18378019.

Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes. Data in Brief (pending).

---
*For questions regarding the data collection or system configuration, please contact luyuzhen@msu.edu.*

Files

README.md

Files (14.4 GB)

Name	Size	Download all
demo.7z md5:eba2fbd0496c35b4a7f071aacefdfc5b	107.7 kB	Download
README.md md5:2cf824733d253ff4eed1061fb6d7aaea	6.3 kB	Preview Download
SP_3D_Dataset.7z md5:a9f753e1d7a2b4a2e839acdd50b4b12f	14.4 GB	Download
sweetpotato_volume_ground-truth.xlsx md5:b63756dd42a2a6a94cf7cf6c61b810bb	19.1 kB	Download

Additional details

References: Publication: 10.1016/j.jfoodeng.2023.111725 (DOI)

	All versions	This version
Views	147	147
Downloads	24	24
Data volume	246.6 GB	246.6 GB

A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes

Authors/Creators

Description

Files

README.md

Files (14.4 GB)

Additional details

Related works