Voxelized fragment dataset for machine learning

López Ruiz, Alfonso; Rueda Ruiz, Antonio Jesús; Segura, Rafael; Ogayar Anguita, Carlos Javier; Navarro, Pablo; Fuertes García, José Manuel

doi:10.5281/zenodo.13899699

Published October 7, 2024 | Version v1

Dataset Open

Voxelized fragment dataset for machine learning

1. Universidad de Jaén

One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. Hence, it is required to balance computational efficiency and realism for generating large datasets for marching learning.

We proposed a GPU-based fragmentation method to improve the baseline Discrete Voronoi Chain aimed at completing this dataset generation task. The dataset in this repository includes voxelized fragments from high-resolution 3D models, curated to be used as training sets for machine learning models. More specifically, these models come from an archaeological dataset, which led to more than 1M fragments from 1,052 Iberian vessels. In this dataset, fragments are not stored individually; instead, the fragmented voxelizations are provided in a compressed binary file (.rle.zip). Once uncompressed, each fragment is represented by a different number in the grid. The class to which each vessel belongs is also included in class.csv. The GPU-based pipeline that generated this dataset is explained at https://doi.org/10.1016/j.cag.2024.104104.

Please, note that this dataset originally provided voxel data, point clouds and triangle meshes. However, we opted for including only voxel data because 1) the original dataset is too large to be uploaded to Zenodo and 2) the original intent of our paper is to generate implicit data in the form of voxels. If interested in the whole dataset (450GB), please visit the web page of our research institute.

Files

class.csv

Files (3.1 GB)

Name	Size	Download all
class.csv md5:7cb4973d80d3f9e08ee4b955b9d3c3dd	16.5 kB	Preview Download
decompress_grid.py md5:42ffbb74ffe4540003926bd6887a7ac3	2.1 kB	Download
vessels_binary.zip md5:822f1d38dc38f33eace8467685e01338	3.1 GB	Preview Download

Additional details

Other (English): Implicit fragment dataset for machine learning

Is derived from: Journal article: 10.1016/j.cag.2024.104104 (DOI)

Ministerio de Ciencia, Innovación y Universidades
Formación de Profesorado Universitario (FPU) FPU19/00100

Accepted: 2024-10

Repository URL: https://github.com/AlfonsoLRz/VoxelFragmentML
Programming language: C++
Development Status: Active

	All versions	This version
Views	134	134
Downloads	159	159
Data volume	133.9 GB	133.9 GB

Voxelized fragment dataset for machine learning

Files

class.csv

Files (3.1 GB)

Additional details

Additional titles

Related works

Funding

Dates

Software

Voxelized fragment dataset for machine learning

Creators

Description

Files

class.csv

Files (3.1 GB)

Additional details

Additional titles

Related works

Funding

Dates

Software