A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes
Authors/Creators
Description
Dataset Structure
```
SP_3D_Dataset.zip
├─ raw_PointClouds
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.laz
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
├─ segmented_PointClouds
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.laz
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
├─ selected_Images
│ ├─ SampleID_001
│ │ ├─ ...
│ │ ├─ SampleID_001_FrameID_0xx.png
│ │ └─ ...
│ ├─ ...
│ └─ SampleID_200
│
└─ sweetpotato_volume_ground-truth.xlsx
```
---
Dataset Organization
When extracting the archive, the dataset is organized into three primary functional folders and one ground-truth file:
- **raw_PointClouds**: Contains raw 3D point cloud data (.laz format) directly acquired from the LiDAR sensor.
- **segmented_PointClouds**: Contains cleaned point cloud data after background removal and statistical denoising.
- **selected_Images**: Contains synchronized 2D RGB images (.png).
- **sweetpotato_volume_ground-truth.xlsx**: A spreadsheet containing physical reference measurements for all 200 samples.
Directory Hierarchy
The dataset follows a consistent hierarchical structure across all directories:
- Each primary folder contains 200 subfolders (labeled `SampleID_001` to `SampleID_200`), corresponding to specific sweetpotato samples.
- Image Resolution: RGB and Depth maps are 1280 × 720 pixels.
- Temporal Consistency: The original frame indices from the raw recording were preserved to allow for multi-view fusion and tracking research.
Dataset Summary
- Total Samples: 200 "Beauregard" sweetpotatoes.
- Imaging System: Custom LiDAR-based roller conveyor (Intel RealSense L515).
- Reference Method: Standard water displacement method (average of two replicates).
- Storage Space: Approximately 15.5 GB (uncompressed).
Camera & Imaging Specifications
As requested by the system configuration, the Intel RealSense L515 LiDAR was operated with the following settings to ensure data consistency:
| Parameter | Configuration / Value |
|-----------------------|-----------------------------------------------------------|
| **Sensor Model** | Intel RealSense™ L515 LiDAR |
| **RGB Resolution** | 1280 × 720 pixels (.png) |
| **Depth Resolution** | 1280 × 720 pixels (.ply) |
| **Frame Rate** | 30 FPS |
| **Laser Wavelength** | 860 nm |
| **Mounting Height** | 0.43 m above the conveyor |
| **Imaging Lighting** | Ambient indoor light (no controlled lighting) |
| **Conveyor Speed** | 10 mm/s|
| **Exposure Time** | 1250 $\mu s$|
| **Gain** | 10|
| **Brightness** | 1|
| **Contrast** | 50|
| **Backlight Compensation** | 98|
| **Saturation** | 50|
| **Sharpness** | 80|
| **White Balance** | 4600 K|
File Naming Convention
The naming convention ensures traceability and temporal alignment:
`SampleID_[ID]_FrameID_[ID].[ext]`
1. **SampleID**: Unique identifier for each physical sweetpotato root.
2. **FrameID**: Sequential order of the frame extracted from the continuous recording.
Example: `SampleID_001_FrameID_015.png` is the 15th frame of the 1st sample.
Data Processing & Feature Extraction
- Segmentation: Binary masks were generated using HSV thresholding ($T_{min}=[7,40,120]$, $T_{max}=[20,165,219]$), flood-filling, and morphological opening.
- Denoising: Statistical Outlier Removal (SOR) was applied using 50 nearest neighbors and a 0.02 standard deviation ratio.
- Potential Usage: The high-density point clouds support the extraction of 2D features (area, perimeter, radial distance) and 3D features (projected volume, surface area) as described in the related research.
Demo
A minimal workflow is provided for readers to validate the data; see the `Demo` folder for details. In the (unzipped) demo folder, the spreadsheet "Ft.csv" contains a feature set for 200 samples, each comprising 6 extracted frames and 8 features per RGB-D frame, that is, a 1200 x 8 feature matrix, while the spreadsheet "y.csv" contains the corresponding volume ground-truth. The feature and ground-truth data are also saved in "Ft.mat" and "y.mat" Matlab files, respectively, in the "Demo" folder. To demonstrate the volume prediction, features for samples 1-134 were used for model development and samples 135-200 for testing, as described in Xu et al. (2024). The Python script `data_loader.py` is included for Synchronized 2D/3D dataset traversing, and the "SW_VOL_PRED_MLR_PCA.py" script is used for volume modeling and prediction using simple methods, i.e., multiple linear regression (MLR) and principal component regression (principal component analysis + MLR). The models achieved accuracy comparable to that reported in the related paper (Xu et al., 2024). The model results are reported in Zhang et al. (2026b).
Citations
If you use this dataset in your research, please cite the following:
**Related Research Article:**
Xu, J., Lu, Y., Olaniyi, E., & Harvey, L. (2024). Online volume measurement of sweetpotatoes by a LiDAR-based machine vision system. *Journal of Food Engineering*, 361, 111725. https://doi.org/10.1016/j.jfoodeng.2023.111725
**Dataset:**
Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026a). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes [Data set]. *Zenodo*. https://doi.org/10.5281/zenodo.18378019.
Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes. Data in Brief (pending).
---
*For questions regarding the data collection or system configuration, please contact luyuzhen@msu.edu.*
Files
README.md
Additional details
Related works
- References
- Publication: 10.1016/j.jfoodeng.2023.111725 (DOI)