Anonymisation of the Dortmund Chat Corpus 2.1

Lüngen, Harald; Beißwenger, Michael; Herzberg, Laura; Pichler, Cathrin

doi:10.5281/zenodo.1041873

Published September 30, 2017 | Version v1

Conference paper Open

Anonymisation of the Dortmund Chat Corpus 2.1

1. Institute for the German Language, Germany
2. University Duisburg-Essen, Germany
3. University of Mannheim, Germany

As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before it could be republished. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality management are discussed. Finally, pseudonymisation as an alternative to categorisation is discussed in general as a method of the anonymisation of CMC data, as well as possibilities of a (partial) automatisation of the process.

Files

cmccorpora17-26.pdf

Files (383.0 kB)

Name	Size	Download all
cmccorpora17-26.pdf md5:e02d41519220f799860c5859b54b6d6c	383.0 kB	Preview Download

Additional details

Is part of: Conference proceeding: 10.5281/zenodo.1040713 (DOI)

255

Views

294

Downloads

Show more details

	All versions	This version
Views	255	255
Downloads	294	294
Data volume	119.1 MB	119.1 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

cmc-corpora conference series

Imprint

Proceedings of the 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17).

Conference

5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17) , Bolzano, Italy, 3-4 October 2017

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 4, 2017
Modified: August 3, 2024

Anonymisation of the Dortmund Chat Corpus 2.1

Authors/Creators

Description

Files

cmccorpora17-26.pdf

Files (383.0 kB)

Additional details

Related works