There is a newer version of the record available.

Published May 21, 2021 | Version 1.0.0
Dataset Open

Neural Language Models for Nineteenth-Century English (dataset; language model zoo)

  • 1. The Alan Turing Institute, London, UK
  • 2. University of Amsterdam, Institute for Logic, Language and Computation, Netherlands

Description

This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair).

Github repository: https://github.com/Living-with-machines/histLM.

Files

bert.zip

Files (13.2 GB)

Name Size Download all
md5:fea637f1dd685fef5301490ee9cffbb0
2.0 GB Preview Download
md5:f60c2b92ea99e6e2245bbbaca82b427f
8.5 GB Preview Download
md5:0f29ad54b98a841fe57e7e5b003b180c
71.0 MB Preview Download
md5:f436627a9bba8f53174d0975c36fc72a
3.1 kB Preview Download
md5:47f7ff9d77bf61ff2a20d7c641ca38af
2.6 GB Preview Download

Additional details

Funding

UK Research and Innovation
Living with Machines AH/S01179X/1
UK Research and Innovation
The Alan Turing Institute EP/N510129/1