OCR-VQGAN trained on Paper2Fig100k

Juan A. Rodríguez; David Vázquez; Issam Laradji; Marco Pedersoli; Pau Rodríguez

doi:10.5281/zenodo.7299220

Published November 7, 2022 | Version 1

Other Open

OCR-VQGAN trained on Paper2Fig100k

1. Computer Vision Center, Barcelona
2. ServiceNow Research
3. ÉTS Montreal

OCR-VQGAN: Taming Text-within-Image Generation

Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure.

Here we provide the pre-trained model using Paper2Fig100k dataset, which performs a downsampling of factor f=16, using a discrete codebook of 16384 and vectors of dimension 256. Refer to github.com/joanrod/ocr-vqgan/ to see implementation and details.

Our paper @WACV2023 presents how we design an OCR perceptual loss to be used in the VQGAN framework (OCR-VQGAN). In the paper, we also define the proposed Paper2Fig100k dataset.

Files

ocr-vqgan-f16-c16384-d256.zip

Files (961.7 MB)

Name	Size	Download all
ocr-vqgan-f16-c16384-d256.zip md5:606c21f29a9ff186b29ec2c8d7dddfb6	961.7 MB	Preview Download

	All versions	This version
Views	210	210
Downloads	169	169
Data volume	202.9 GB	202.9 GB

OCR-VQGAN trained on Paper2Fig100k

Creators

Description

Files

ocr-vqgan-f16-c16384-d256.zip

Files (961.7 MB)