Published May 21, 2024 | Version v1
Dataset Open

MER dataset im2latexv2 - Part 1

  • 1. Institute of Computer Science, ZHAW, 8401 Winterthur, Switzerland
  • 2. People and Computing Laboratory, University of Zurich, 8050 Zurich, Switzerland
  • 3. Centre for Artificial Intelligence, ZHAW, 8400 Winterthur, Switzerland
  • 4. European Centre for Living Technology (ECLT), 30123 Venice, Italy

Description

Mathematical Expression Recognition Dataset im2latexv2 - Part 1

This repository contains Part 1 of the im2latexv2 dataset presented in the paper MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition.

The dataset is an enhanced version of the im2latex-100k dataset. It uses a novel LaTeX normalization process and 61 rendering environments to make the dataset more realistic.

Please also download Part 2 of the im2latexv2 dataset (doi: 10.5281/zenodo.11296280) and copy the subfolders in the folder of Part 1.

To unpack all images, please use the unpack_im2latexv2.py script.

The CSV files have the following structure:

formula images    
tokenized formula (tokens separated by white spaces) path to image with rendering env 1 path to image with rendering env 2 ....

Files

im2latexv2-Part1.zip

Files (40.6 GB)

Name Size Download all
md5:56147314ef7ca6c43f218fa846bb1af9
40.6 GB Preview Download
md5:8b3e4d4335d8b737c641b094ba6b37b3
1.0 kB Download

Additional details

Related works

Is part of
Publication: 10.1109/ACCESS.2024.3404834 (DOI)

Software

Repository URL
https://github.com/felix-schmitt/MathNet
Programming language
Python