3483048
doi
10.5281/zenodo.3483048
oai:zenodo.org:3483048
user-iapr-tc11
InftyMCCDB-2 dataset
Mahshad Mahdavi
Rochester Institute of Technology
info:eu-repo/semantics/openAccess
Offline Recognition
Math recognition
Typeset Equations
<p>InftyMCCDB-2 dataset is a modified version of InftyCDB-2 which contains mathematical expressions from scanned article pages.</p>
<p>The original dataset has 21,056 math expressions. We remove formulas with matrices and grids, leaving 19,381 formulas. The dataset includes 213 symbol classes, and is split into two sets: training (12551 images), and testing (6830 images) with approximately the same distribution of symbol classes and relation classes. The expressions range in size from a single symbol to more than 75 symbols, with an average of 7.33 symbols per expression. </p>
<p>The original InftyCDB-2 provides ground truth at the symbol level. We extracted connected component bounding boxes, and generated new ground truth for each image using a labeled adjacency matrix (`label graph') representation.</p>
<p>The set of .lg (label graph) ground truth files are provided, along with a .png image for each expression.</p>
Zenodo
2019-10-11
info:eu-repo/semantics/other
3483047
user-iapr-tc11
0.0
1579893952.35914
8512828
md5:429d8a488ace9b4203b8c131fed5ffd3
https://zenodo.org/records/3483048/files/LG_test.zip
35539450
md5:6d82935ac1f8d2b511c08aaf75593d25
https://zenodo.org/records/3483048/files/LG.zip
27194383
md5:0ed9bcd2759e391202d64bd56edfc955
https://zenodo.org/records/3483048/files/IMG.zip
public
10.5281/zenodo.3483047
isVersionOf
doi