Published February 20, 2026 | Version v1
Other Open

D-FINE Nano region segmentation model trained on LADaS

Authors/Creators

  • 1. ALMAnaCH, Inria Paris

Description

D-FINE Nano LADaS Region Segmentation

A D-FINE Nano region segmentation model for document layout analysis, trained on the LADaS dataset. The model detects 37 region types following the SegmOnto vocabulary.

This is the smallest model in the D-FINE LADaS series, suitable for resource-constrained environments.

Architecture

D-FINE is a transformer-based object detector for document layout analysis. This Nano variant uses:

  • Backbone: HGNetv2-B0
  • Encoder hidden dim: 128
  • Decoder hidden dim: 128
  • Transformer decoder layers: 3
  • Feature strides: [16, 32]
  • Detection queries: 300
  • Input resolution: 1280x1280
  • Model size: ~15 MB

Training Data

The model was trained on the LADaS (Layout Analysis Dataset) by Thibault Clérice et al. LADaS is a multi-document diachronic layout analysis dataset comprising documents from the 17th century to the present, including monographs, PhD theses, auction catalogs, academic papers, and magazines. Annotations follow the SegmOnto vocabulary.

Uses

Intended for document region segmentation as part of an automatic text recognition pipeline with kraken. Detects document zones such as main text, headings, graphics, margins, tables, and other structural elements in historical and contemporary documents.

Bias, Risks, and Limitations

  • The training data is predominantly French with Latin script. Performance on other languages may vary.
  • Rare classes (FigureZone-FigDesc, FigureZone-Head, GraphicZone-TextualContent, MainZone-Maths) have very low or zero detection scores.
  • The Nano variant trades accuracy for speed and model size. Consider larger variants for production use.

Training Details

Training Procedure and Hyperparameters

  • Training regime: BF16 mixed precision
  • Epochs: 32
  • Learning rate: 1e-4
  • Schedule: Constant with 1000-step warmup
  • Weight decay: 1e-5
  • Batch size: 8
  • Image size: 1280x1280

Evaluation

Testing Data

LADaS test split.

Metrics

| | mAP@50 | mAP@50:95 | Precision | Recall | F1 | |---|---|---|---|---|---| | Overall | 0.2960 | 0.2084 | 0.2960 | 0.7086 | 0.4176 |

Per-Class Results

| Class | mAP@50:95 | Precision | Recall | F1 | |---|---|---|---|---| | AdvertisementZone | 0.0242 | 0.0384 | 0.5405 | 0.0717 | | DigitizationArtefactZone | 0.2862 | 0.3970 | 0.6152 | 0.4826 | | DropCapitalZone | 0.4489 | 0.7038 | 0.6545 | 0.6783 | | FigureZone | 0.0164 | 0.0204 | 0.3333 | 0.0385 | | FigureZone-FigDesc | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | FigureZone-Head | 0.0000 | 0.0000 | 0.0000 | 0.0000 | | FormZone | 0.0067 | 0.0116 | 0.4731 | 0.0226 | | GraphicZone | 0.4101 | 0.5299 | 0.7590 | 0.6241 | | GraphicZone-Decoration | 0.1904 | 0.2682 | 0.5568 | 0.3620 | | GraphicZone-FigDesc | 0.0454 | 0.0670 | 0.5765 | 0.1200 | | GraphicZone-Head | 0.2848 | 0.4802 | 0.6440 | 0.5502 | | GraphicZone-Part | 0.0433 | 0.0557 | 0.4520 | 0.0991 | | GraphicZone-TextualContent | 0.0087 | 0.0127 | 0.2602 | 0.0243 | | MainZone-Continued | 0.4230 | 0.4799 | 0.6577 | 0.5549 | | MainZone-Date | 0.0561 | 0.0699 | 0.5080 | 0.1230 | | MainZone-Entry | 0.4415 | 0.5915 | 0.6545 | 0.6214 | | MainZone-Head | 0.3715 | 0.5556 | 0.6725 | 0.6085 | | MainZone-Lg | 0.4122 | 0.5353 | 0.7317 | 0.6183 | | MainZone-ListItem | 0.0730 | 0.1043 | 0.4695 | 0.1707 | | MainZone-Maths | 0.0200 | 0.0237 | 0.4548 | 0.0451 | | MainZone-Other | 0.2893 | 0.3153 | 0.5856 | 0.4099 | | MainZone-P | 0.5431 | 0.6270 | 0.8415 | 0.7186 | | MainZone-Signature | 0.4327 | 0.4892 | 0.8715 | 0.6267 | | MainZone-Sp | 0.5568 | 0.6818 | 0.7891 | 0.7316 | | MarginTextZone-ContinuedNotes | 0.0704 | 0.0797 | 0.4143 | 0.1337 | | MarginTextZone-ManuscriptAddendum | 0.2401 | 0.4684 | 0.5017 | 0.4845 | | MarginTextZone-Notes | 0.2552 | 0.4180 | 0.6039 | 0.4940 | | MusicZone | 0.2682 | 0.4333 | 0.5861 | 0.4983 | | NumberingZone | 0.2968 | 0.6264 | 0.4965 | 0.5539 | | QuireMarksZone | 0.2201 | 0.4697 | 0.4000 | 0.4321 | | RunningTitleZone | 0.3509 | 0.5747 | 0.6205 | 0.5967 | | StampZone | 0.2901 | 0.3688 | 0.6840 | 0.4792 | | StampZone-Sticker | 0.0593 | 0.0921 | 0.6061 | 0.1599 | | TableZone | 0.0831 | 0.1251 | 0.4437 | 0.1951 | | TableZone-Head | 0.0077 | 0.0131 | 0.2564 | 0.0250 | | TitlePageZone | 0.1347 | 0.1672 | 0.6935 | 0.2695 | | TitlePageZone-Index | 0.0508 | 0.0583 | 0.3818 | 0.1012 |

Citation

BibTeX:

bibtex @misc{clrice2024ladaslargemultitaskdiachronic, title={LADaS -- a Large multi-task and diachronic dataset for Layout Analysis of diverse historical documents}, author={Thibault Clérice}, year={2024}, eprint={2411.10068}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.10068}, }

Files

README.md

Files (15.3 MB)

Name Size Download all
md5:862bc9c7c2ad5906728d259d3d23c109
15.3 MB Download
md5:f421d65fd7c44872479c138027a70c97
5.5 kB Preview Download