Published December 15, 2021 | Version v5
Dataset Open

Text of Wikisource pages of German magazine 'Die Gartenlaube'

Description

Text of all Gartenlaube pages transcribed in German Wikisource. Text parsed on 2021-12-13, the output is combinend in separeted json files, each file per volume, starting 1853 and ending 1899. All 47 json files are compressed into one *.tar.xz file.

The syntax of the json looks like:

  [{"pageid" : {PAGEID},
 "title"   : {PAGETITLE},
 "lastrevid" : {REVISIONID}, 
 "proofread" : {{JSON_OBJECT_Proofread_Status}}
 "html"    : {HTML_OUTPUT},
 "wikitext": {WIKI_MARKUP},
 "plaintxt": {mwparserfromhell(WIKI_MARKUP).strip_code)}
 }]

Files

Files (105.1 MB)

Name Size Download all
md5:05bff99558d17f29e7c569e2c78bcb0d
105.1 MB Download

Additional details