Published May 10, 2022 | Version 1.0
Dataset Open

Wikipedia rendered as synthetic handwriting

Authors/Creators

  • 1. Brigham Young University

Description

This is the synthetic handwriting data used to pre-train Dessurt (https://arxiv.org/abs/2203.16618).

It is text sampled from Wikipedia and generated with the method described in "Text and Style Conditioned GAN for Generation of Offline Handwriting Lines" (https://arxiv.org/abs/2009.00678). More data can be quite easily obtained using this code: https://github.com/herobd/handwriting_line_generation

Inside the tar is a single directory with ~800k generated handwriting line images ("sample_0.png", "sample_123.png", "sample_3292524.png", etc.), and "OUT.txt" which has the GT for each line image.

Files

Files (24.7 GB)

Name Size Download all
md5:edbdd960d2b294742807851e5734cc66
24.7 GB Download