Blind stamps dataset

Štursa, Dominik; Dolezel, Petr; Roleček, Jiří; Kopáčik, Ivan

doi:10.5281/zenodo.17494657

Published October 31, 2025 | Version v1

Dataset Open

Blind stamps dataset

1. University of Pardubice

Dataset combines real exemplar stamp crops (photographs of controlled physical impressions of each documented Rajman tool) with automated synthetic composition to produce annotated training, validation and test images. The physical impressions were produced by pressing each documented metal stamp onto prepared leather or leather-like sheets and photographing the results; those exemplar crops (and masks) are used as building blocks for synthetic images. Two synthetic pipelines were implemented:

Procedural Compositional Synthesis (PCS): authentic stamp crops are rotated (no rescaling), blended seam-preservingly onto synthesized leather-like canvases (patch-quilting from source backgrounds or gradient fallback), and globally/localy photometrically augmented (brightness/contrast, hue shifts, additive noise, blur). Seamless cloning / feathered alpha blending and Poisson-style compositing are used to reduce cut-and-paste artifacts. Automatic annotation export produces tight bounding boxes and class labels for every placed stamp.

Background-Normalized Synthetic Stamping (BNS): stamp crops are foreground-segmented and edge-conditioned (Otsu or adaptive thresholding with fill-ratio checks), isolated from local background, and composited onto synthetic gray-noise substrates with instance-specific noise amplitude and optional low-frequency shading. Local photometric perturbations simulate embossing contrast variability. Placement and annotation mirror PCS (bounding boxes + class labels).

Contents and structure:
Each generated image has an accompanying annotation text file (machine-produced) listing placed object instances as: tool_identifier, x_min, y_min, x_max, y_max (suitable for standard object detection frameworks). The dataset also provides (for every stamp exemplar) the crop image and binary mask used as input primitives. Images are supplied in lossless image format (e.g., PNG) and annotations as UTF-8 text files (CSV / YOLO-compatible or PascalVOC-like files can be produced from the included annotations).

Splits and sample counts:
The paper reports three training corpora (PCS, BNS, Combined) and gives the following per-corpus split:

Training: 700 images
Validation: 150 images
Test: 150 images.

The metadata description in txt format is included in the attached zip file.

Files

Dataset_blind_stamps.zip

Files (4.9 GB)

Name	Size	Download all
Dataset_blind_stamps.zip md5:4fc6bc2357ff82d7245bb2fd71c45c87	4.9 GB	Preview Download

	All versions	This version
Views	123	123
Downloads	19	19
Data volume	107.0 GB	107.0 GB

Blind stamps dataset

Authors/Creators

Description

Files

Dataset_blind_stamps.zip

Files (4.9 GB)