Published September 30, 2017
| Version v1
Conference paper
Open
Anonymisation of the Dortmund Chat Corpus 2.1
- 1. Institute for the German Language, Germany
- 2. University Duisburg-Essen, Germany
- 3. University of Mannheim, Germany
Description
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before it could be republished. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality management are discussed. Finally, pseudonymisation as an alternative to categorisation is discussed in general as a method of the anonymisation of CMC data, as well as possibilities of a (partial) automatisation of the process.
Files
cmccorpora17-26.pdf
Files
(383.0 kB)
Name | Size | Download all |
---|---|---|
md5:e02d41519220f799860c5859b54b6d6c
|
383.0 kB | Preview Download |
Additional details
Related works
- Is part of
- Conference proceeding: 10.5281/zenodo.1040713 (DOI)