Published December 15, 2021
| Version v5
Dataset
Open
Text of Wikisource pages of German magazine 'Die Gartenlaube'
Creators
Description
Text of all Gartenlaube pages transcribed in German Wikisource. Text parsed on 2021-12-13, the output is combinend in separeted json files, each file per volume, starting 1853 and ending 1899. All 47 json files are compressed into one *.tar.xz file.
The syntax of the json looks like:
[{"pageid" : {PAGEID},
"title" : {PAGETITLE},
"lastrevid" : {REVISIONID},
"proofread" : {{JSON_OBJECT_Proofread_Status}}
"html" : {HTML_OUTPUT},
"wikitext": {WIKI_MARKUP},
"plaintxt": {mwparserfromhell(WIKI_MARKUP).strip_code)}
}]
Files
Files
(105.1 MB)
Name | Size | Download all |
---|---|---|
md5:05bff99558d17f29e7c569e2c78bcb0d
|
105.1 MB | Download |
Additional details
Related works
- Is derived from
- Software: https://github.com/DieDatenlaube/DieDatenlaube/blob/master/Get_Gartenlaube_SeitenText.ipynb (URL)
- Is documented by
- Other: https://diedatenlaube.github.io/Get_Gartenlaube_SeitenText (URL)
- https://www.wikidata.org/wiki/Q110333706 (URL)