1 Million Captioned Dutch Newspaper Images

doi:10.5281/zenodo.844462

Published May 23, 2016 | Version v1

Conference paper Open

1 Million Captioned Dutch Newspaper Images

1. University of Amsterdam
2. National Library of the Netherlands

Images naturally appear alongside text in a wide variety of media, such as books, magazines, newspapers, and in online articles. This type of multi-modal data offers an interesting basis for vision and language research but most existing datasets use crowdsourced text, which removes the images from their original context. In this paper, we introduce the KBK-1M dataset of 1.6 million images in their original context, with co-occurring texts found in Dutch newspapers from 1922 - 1994. The images are digitally scanned photographs, cartoons, sketches, and weather forecasts; the text is generated from OCR scanned blocks. The dataset is suitable for experiments in automatic image captioning, image―article matching, object recognition, and data-to-text generation for weather forecasting. It can also be used by humanities scholars to analyse photographic style changes, the representation of people and societal issues, and new tools for exploring photograph reuse via image-similarity-based search.

Notes

See also http://www.lrec-conf.org/proceedings/lrec2016/summaries/448.html

Files

448_Paper.pdf

Files (1.8 MB)

Name	Size	Download all
448_Paper.pdf md5:00ba902239c949d56dbb702c3888f8bf	1.8 MB	Preview Download

146

Views

Downloads

Show more details

	All versions	This version
Views	146	146
Downloads	59	59
Data volume	107.9 MB	107.9 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

European Language Resources Association (ELRA)

Imprint

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris, France. ISBN: 978-2-9517408-9-1.

Conference

Language Resources and Evaluation (LREC) , Portorož, Slovenia, 23-28 May 2016

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 17, 2017
Modified: January 20, 2020

1 Million Captioned Dutch Newspaper Images

Creators

Description

Notes

Files

448_Paper.pdf

Files (1.8 MB)