Dataset Open Access

GloSAT Historical Measurement Table Dataset

Stuart E. Middleton; Juliusz Ziomek

Dataset containing scanned historical measurement table documents from ship logs and land measurement stations. Annotations provided in this dataset are designed to allow finergrained table detection and table structure recognition models to be trained and tested. Annotations are region boundaries for tables, cells, headings, headers and captions.

This dataset release includes code to train models on a training split, to use trained model checkpoints for inference and to evaluate interred results on a test split. Pretrained models used in the published HIP-2021 paper are included in the dataset so results can be easily reproduced without training the model checkpoints yourself.

Instructions and code can be found on the linked github repository https://github.com/stuartemiddleton/glosat_table_dataset

A pre-print of the HIP-2021 paper can be found on the authors website https://www.southampton.ac.uk/~sem03/HIP_2021.pdf

Original images sourced with permission from UK Met Office, US NOAA and weatheerrescue.org (University of Reading).

This work is part of the GloSAT project https://www.glosat.org/ and supported by the Natural Environment Research Council (NE/S015604/1). The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.

Files (5.2 GB)
Name Size
datasets.zip
md5:eaa0fc13767862b752604bb91210e98e
1.9 GB Download
LICENSE
md5:a23038c94120c4eb1f5297ad0fee49d6
2.0 kB Download
models.zip
md5:170318fe6ebc0353c82e1cc1269540ec
3.3 GB Download
163
38
views
downloads
All versions This version
Views 163163
Downloads 3838
Data volume 69.2 GB69.2 GB
Unique views 136136
Unique downloads 3131

Share

Cite as