Dataset Open Access
Dataset containing scanned historical measurement table documents from ship logs and land measurement stations. Annotations provided in this dataset are designed to allow finergrained table detection and table structure recognition models to be trained and tested. Annotations are region boundaries for tables, cells, headings, headers and captions.
This dataset release includes code to train models on a training split, to use trained model checkpoints for inference and to evaluate interred results on a test split. Pretrained models used in the published HIP-2021 paper are included in the dataset so results can be easily reproduced without training the model checkpoints yourself.
Instructions and code can be found on the linked github repository https://github.com/stuartemiddleton/glosat_table_dataset
A pre-print of the HIP-2021 paper can be found on the authors website https://www.southampton.ac.uk/~sem03/HIP_2021.pdf
Original images sourced with permission from UK Met Office, US NOAA and weatheerrescue.org (University of Reading).
This work is part of the GloSAT project https://www.glosat.org/ and supported by the Natural Environment Research Council (NE/S015604/1). The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.