Published June 16, 2023
| Version v1
Dataset
Open
Webis Wikipedia Innovation History 2023
- 1. Leipzig University and ScaDS.AI
- 2. DZHW Berlin
- 3. Technische Universität Berlin
- 4. Bauhaus-Universität Weimar
Description
History sections of science and technology articles on Wikipedia extracted from the Wikimedia dump from 1 January 2022. Articles retrieved using Wikipedia's category network. History sections extracted using a combination of section-heading-based heuristics and classifiers trained on articles with designated history sections.
If you use this corpus, please cite the following paper:
Wolfgang Kircheis, Marion Schmidt, Arno Simons, Benno Stein, and Martin Potthast. Mining the History Sections of Wikipedia Articles on Science and Technology. In 23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL 2023), June 2023. [code] [corpus-viewer]
@InProceedings{kircheis:2023,
author = {Wolfgang Kircheis and Marion Schmidt and Arno Simons and Benno Stein and Martin Potthast},
booktitle = {23rd {ACM/IEEE} Joint Conference on Digital Libraries (JCDL 2023)},
codeurl = {https://github.com/webis-de/JCDL-23},
keywords = {nlp, natural language processing},
month = jun,
title = {{Mining the History Sections of Wikipedia Articles on Science and Technology}},
year = 2023
}
Files
webis-WikiSciTech-23.json
Files
(26.2 MB)
Name | Size | Download all |
---|---|---|
md5:6d060feac286ff7bc38290451f9aa818
|
26.2 MB | Preview Download |