D-FINE Nano region segmentation model trained on LADaS
Description
D-FINE Nano LADaS Region Segmentation
A D-FINE Nano region segmentation model for document layout analysis, trained on the LADaS dataset. The model detects 37 region types following the SegmOnto vocabulary.
This is the smallest model in the D-FINE LADaS series, suitable for resource-constrained environments.
Architecture
D-FINE is a transformer-based object detector for document layout analysis. This Nano variant uses:
- Backbone: HGNetv2-B0
- Encoder hidden dim: 128
- Decoder hidden dim: 128
- Transformer decoder layers: 3
- Feature strides: [16, 32]
- Detection queries: 300
- Input resolution: 1280x1280
- Model size: ~15 MB
Training Data
The model was trained on the LADaS (Layout Analysis Dataset) by Thibault Clérice et al. LADaS is a multi-document diachronic layout analysis dataset comprising documents from the 17th century to the present, including monographs, PhD theses, auction catalogs, academic papers, and magazines. Annotations follow the SegmOnto vocabulary.
Uses
Intended for document region segmentation as part of an automatic text recognition pipeline with kraken. Detects document zones such as main text, headings, graphics, margins, tables, and other structural elements in historical and contemporary documents.
Bias, Risks, and Limitations
- The training data is predominantly French with Latin script. Performance on other languages may vary.
- Rare classes (FigureZone-FigDesc, FigureZone-Head, GraphicZone-TextualContent, MainZone-Maths) have very low or zero detection scores.
- The Nano variant trades accuracy for speed and model size. Consider larger variants for production use.
Training Details
Training Procedure and Hyperparameters
- Training regime: BF16 mixed precision
- Epochs: 32
- Learning rate: 1e-4
- Schedule: Constant with 1000-step warmup
- Weight decay: 1e-5
- Batch size: 8
- Image size: 1280x1280
Evaluation
Testing Data
LADaS test split.
Metrics
| | mAP@50 | mAP@50:95 | Precision | Recall | F1 | |---|---|---|---|---|---| | Overall | 0.2960 | 0.2084 | 0.2960 | 0.7086 | 0.4176 |
Per-Class Results
| Class | mAP@50:95 | Precision | Recall | F1 | |---|---|---|---|---| | AdvertisementZone | 0.0242 | 0.0384 | 0.5405 | 0.0717 | | DigitizationArtefactZone | 0.2862 | 0.3970 | 0.6152 | 0.4826 | | DropCapitalZone | 0.4489 | 0.7038 | 0.6545 | 0.6783 | | FigureZone | 0.0164 | 0.0204 | 0.3333 | 0.0385 | | FigureZone-FigDesc | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | FigureZone-Head | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | FormZone | 0.0067 | 0.0116 | 0.4731 | 0.0226 | | GraphicZone | 0.4101 | 0.5299 | 0.7590 | 0.6241 | | GraphicZone-Decoration | 0.1904 | 0.2682 | 0.5568 | 0.3620 | | GraphicZone-FigDesc | 0.0454 | 0.0670 | 0.5765 | 0.1200 | | GraphicZone-Head | 0.2848 | 0.4802 | 0.6440 | 0.5502 | | GraphicZone-Part | 0.0433 | 0.0557 | 0.4520 | 0.0991 | | GraphicZone-TextualContent | 0.0087 | 0.0127 | 0.2602 | 0.0243 | | MainZone-Continued | 0.4230 | 0.4799 | 0.6577 | 0.5549 | | MainZone-Date | 0.0561 | 0.0699 | 0.5080 | 0.1230 | | MainZone-Entry | 0.4415 | 0.5915 | 0.6545 | 0.6214 | | MainZone-Head | 0.3715 | 0.5556 | 0.6725 | 0.6085 | | MainZone-Lg | 0.4122 | 0.5353 | 0.7317 | 0.6183 | | MainZone-ListItem | 0.0730 | 0.1043 | 0.4695 | 0.1707 | | MainZone-Maths | 0.0200 | 0.0237 | 0.4548 | 0.0451 | | MainZone-Other | 0.2893 | 0.3153 | 0.5856 | 0.4099 | | MainZone-P | 0.5431 | 0.6270 | 0.8415 | 0.7186 | | MainZone-Signature | 0.4327 | 0.4892 | 0.8715 | 0.6267 | | MainZone-Sp | 0.5568 | 0.6818 | 0.7891 | 0.7316 | | MarginTextZone-ContinuedNotes | 0.0704 | 0.0797 | 0.4143 | 0.1337 | | MarginTextZone-ManuscriptAddendum | 0.2401 | 0.4684 | 0.5017 | 0.4845 | | MarginTextZone-Notes | 0.2552 | 0.4180 | 0.6039 | 0.4940 | | MusicZone | 0.2682 | 0.4333 | 0.5861 | 0.4983 | | NumberingZone | 0.2968 | 0.6264 | 0.4965 | 0.5539 | | QuireMarksZone | 0.2201 | 0.4697 | 0.4000 | 0.4321 | | RunningTitleZone | 0.3509 | 0.5747 | 0.6205 | 0.5967 | | StampZone | 0.2901 | 0.3688 | 0.6840 | 0.4792 | | StampZone-Sticker | 0.0593 | 0.0921 | 0.6061 | 0.1599 | | TableZone | 0.0831 | 0.1251 | 0.4437 | 0.1951 | | TableZone-Head | 0.0077 | 0.0131 | 0.2564 | 0.0250 | | TitlePageZone | 0.1347 | 0.1672 | 0.6935 | 0.2695 | | TitlePageZone-Index | 0.0508 | 0.0583 | 0.3818 | 0.1012 |
Citation
BibTeX:
bibtex
@misc{clrice2024ladaslargemultitaskdiachronic,
title={LADaS -- a Large multi-task and diachronic dataset for Layout Analysis of diverse historical documents},
author={Thibault Clérice},
year={2024},
eprint={2411.10068},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.10068},
}
Files
README.md
Files
(15.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:862bc9c7c2ad5906728d259d3d23c109
|
15.3 MB | Download |
|
md5:f421d65fd7c44872479c138027a70c97
|
5.5 kB | Preview Download |