Published May 21, 2021
| Version 1.0.0
Dataset
Open
Neural Language Models for Nineteenth-Century English (dataset; language model zoo)
Authors/Creators
- 1. The Alan Turing Institute, London, UK
- 2. University of Amsterdam, Institute for Logic, Language and Computation, Netherlands
Description
This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair).
Github repository: https://github.com/Living-with-machines/histLM.
Files
bert.zip
Files
(13.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:fea637f1dd685fef5301490ee9cffbb0
|
2.0 GB | Preview Download |
|
md5:f60c2b92ea99e6e2245bbbaca82b427f
|
8.5 GB | Preview Download |
|
md5:0f29ad54b98a841fe57e7e5b003b180c
|
71.0 MB | Preview Download |
|
md5:f436627a9bba8f53174d0975c36fc72a
|
3.1 kB | Preview Download |
|
md5:47f7ff9d77bf61ff2a20d7c641ca38af
|
2.6 GB | Preview Download |
Additional details
Funding
- UK Research and Innovation
- Living with Machines AH/S01179X/1
- UK Research and Innovation
- The Alan Turing Institute EP/N510129/1