Published October 7, 2024 | Version v1
Dataset Open

Voxelized fragment dataset for machine learning

Description

One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. Hence, it is required to balance computational efficiency and realism for generating large datasets for marching learning.

We proposed a GPU-based fragmentation method to improve the baseline Discrete Voronoi Chain aimed at completing this dataset generation task. The dataset in this repository includes voxelized fragments from high-resolution 3D models, curated to be used as training sets for machine learning models. More specifically, these models come from an archaeological dataset, which led to more than 1M fragments from 1,052 Iberian vessels. In this dataset, fragments are not stored individually; instead, the fragmented voxelizations are provided in a compressed binary file (.rle.zip). Once uncompressed, each fragment is represented by a different number in the grid. The class to which each vessel belongs is also included in class.csv. The GPU-based pipeline that generated this dataset is explained at https://doi.org/10.1016/j.cag.2024.104104.

Please, note that this dataset originally provided voxel data, point clouds and triangle meshes. However, we opted for including only voxel data because 1) the original dataset is too large to be uploaded to Zenodo and 2) the original intent of our paper is to generate implicit data in the form of voxels. If interested in the whole dataset (450GB), please visit the web page of our research institute.

Files

class.csv

Files (3.1 GB)

Name Size Download all
md5:7cb4973d80d3f9e08ee4b955b9d3c3dd
16.5 kB Preview Download
md5:42ffbb74ffe4540003926bd6887a7ac3
2.1 kB Download
md5:822f1d38dc38f33eace8467685e01338
3.1 GB Preview Download

Additional details

Additional titles

Other (English)
Implicit fragment dataset for machine learning

Related works

Is derived from
Journal article: 10.1016/j.cag.2024.104104 (DOI)

Funding

Formación de Profesorado Universitario (FPU) FPU19/00100
Ministerio de Ciencia e Innovación

Dates

Accepted
2024-10

Software

Repository URL
https://github.com/AlfonsoLRz/VoxelFragmentML
Programming language
C++
Development Status
Active