Published May 21, 2024
| Version v1
Dataset
Open
MER dataset im2latexv2 - Part 1
Creators
- 1. Institute of Computer Science, ZHAW, 8401 Winterthur, Switzerland
- 2. People and Computing Laboratory, University of Zurich, 8050 Zurich, Switzerland
- 3. Centre for Artificial Intelligence, ZHAW, 8400 Winterthur, Switzerland
- 4. European Centre for Living Technology (ECLT), 30123 Venice, Italy
Description
Mathematical Expression Recognition Dataset im2latexv2 - Part 1
This repository contains Part 1 of the im2latexv2 dataset presented in the paper MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition.
The dataset is an enhanced version of the im2latex-100k dataset. It uses a novel LaTeX normalization process and 61 rendering environments to make the dataset more realistic.
Please also download Part 2 of the im2latexv2 dataset (doi: 10.5281/zenodo.11296280) and copy the subfolders in the folder of Part 1.
To unpack all images, please use the unpack_im2latexv2.py script.
The CSV files have the following structure:
formula | images | ||
tokenized formula (tokens separated by white spaces) | path to image with rendering env 1 | path to image with rendering env 2 | .... |
Files
im2latexv2-Part1.zip
Files
(40.6 GB)
Name | Size | Download all |
---|---|---|
md5:56147314ef7ca6c43f218fa846bb1af9
|
40.6 GB | Preview Download |
md5:8b3e4d4335d8b737c641b094ba6b37b3
|
1.0 kB | Download |
Additional details
Related works
- Is part of
- Publication: 10.1109/ACCESS.2024.3404834 (DOI)
Software
- Repository URL
- https://github.com/felix-schmitt/MathNet
- Programming language
- Python