DEplain-APA
- 1. Heinrich Heine University
- 2. APA - Austria Press Agency eG
Description
DEplain: A corpus for German Text Simplification
This repository contains the corpus called DEplain-APA for German text simplification (document and sentence simplification). The corpus contains Austrian nexts text provided by the APA - Austria Presse Agentur eG. All of the sentence-wise aligned pairs (complex-simple) are manually aligned. The following table summarizes the most important meta data of the corpus.
| meta data | value |
| language | DE-AT (Austrian German) |
| domain | news |
| source language level | B1 |
| target language level | A2 |
| # document pairs (total, train/dev/test) | 483 (387/48/48) |
| # sentence pairs (total, train/dev/test) | 13,122 (10,660/1,231/1,231) |
| # complex sentences | 25,607 |
| # simple sentences | 26,471 |
For more information, please have a look at our paper. If you use this corpus, please also cite our paper and name APA - Austria Presse Agentur eG as data provider:
Regina Stodden, Omar Momen, and Laura Kallmeyer. 2023. DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16441–16463, Toronto, Canada. Association for Computational Linguistics.