Published October 14, 2021 | Version 0.0.1
Dataset Open

19th Century United States Newspaper Advert images with 'illustrated' or 'non illustrated' labels

  • 1. British Library

Contributors

Data collector:

Description

The Dataset contains images derived from the Newspaper Navigator (news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/). 

[The Newspaper Navigator dataset] consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project.

source: https://news-navigator.labs.loc.gov/

One of these categories is 'advertisements. This dataset contains a sample of these images with additional labels indicating if the advert is 'illustrated' or 'not illustrated'.

The data is organised as follows:

  • The images themselves can be found in `images.zip`
  • `newspaper-navigator-sample-metadata.csv` contains metadata about each image drawn from the Newspaper Navigator Dataset.
  • `ads.csv` contains the labels for the images as a CSV file
  • `sample.csv` contains additional metadata about the images (based on the newspapers those images came from). 

This dataset was created for use in an under-review Programming Historian tutorial (http://programminghistorian.github.io/ph-submissions/lessons/computer-vision-deep-learning-pt1) The primary aim of the data was to provide a realistic example dataset for teaching computer vision for working with digitised heritage material. The data is shared here since it may be useful for others. This data documentation is a work in progress and will be updated when the Programming Historian tutorial is released publicly.

The metadata CSV file contains the following columns:

- filepath
- pub_date
- page_seq_num
- edition_seq_num
- batch
- lccn
- box
- score
- ocr
- place_of_publication
- geographic_coverage
- name
- publisher
- url
- page_url
- month
- year
- iiif_url

Files

ads.csv

Files (49.1 MB)

Name Size Download all
md5:08bb5190ef001213c519d1add11e76be
47.4 kB Preview Download
md5:547c8245118bc61f30d318bb9ec9d19d
68.3 kB Preview Download
md5:1055d043631231b884a1be9a56a76ee5
48.1 MB Preview Download
md5:39b746509b5468a7092f7b9ff510537d
876.4 kB Preview Download

Additional details

Related works

Is derived from
Journal article: https://arxiv.org/abs/2005.01583 (URL)
Requires
Software: 10.5281/zenodo.5537185 (DOI)

Funding

Living with Machines AH/S01179X/1
UK Research and Innovation