Published January 4, 2017 | Version v1
Dataset Open

English Wikipedia

  • 1. Language Technology Group, TU Darmstadt, Germany

Description

This text corpus is composed of texts of English Wikipedia extracted from the Wikipedia dump of 26th September 2015 using the WikiExtractor tool (https://github.com/attardi/wikiextractor). 

Files

Files (4.5 GB)

Name Size Download all
md5:c1fefd53b73798459e8623e2b965f75f
4.5 GB Download