Conference paper Open Access

MoCoDa 2: Creating a Database and Web Frontend for the Repeated Collection of Mobile Communication (WhatsApp, SMS & Co)

Beißwenger, Michael; Fladrich, Marcel; Imo, Wolfgang; Ziegler, Evelyn

The poster reports about intermediate results of MoCoDa 2, an ongoing project funded by the Ministry for Innovation, Science, Research and Technology of the German federal state North Rhine-Westphalia in which we are developing a database and web frontend for the repeated, donation-based collection of CMC interactions from smartphone messaging apps like WhatsApp. The database shall serve as a resource not only for quantitative but also for qualitative approaches in the analysis of CMC. MoCoDa 2 builds on experiences from the preceding project MoCoDa which has collected a (relatively small) set of 2,198 interactions with 19,161 user posts or ca. 193,000 tokens since 2012. For MoCoDa 2 the database and web frontend will be re-implemented from the scratch and expanded with additional functions and features: - A form for donating and editing the data, which involves the donators into the editing and anonymization process and assists them with capturing metadata on the context and topic of the donated sequences as well as on the interlocutors and their social relations. Anonymization will follow an anonymization guideline developed in the CLARIN-D curation project ChatCorpus2CLARIN. - Part-of-speech annotations which comply with the extended 'STTS 2.0' tagset for German CMC and which will be created using a toolchain provided by the Language Technology Lab (LTL) at the University of Duisburg-Essen. - A TEI export for the collected data on basis of the 'CLARIN-D TEI schema for CMC'. Through adopting the STTS 2.0 tagset and a TEI-based export format the corpus data will be interoperable with corpora that are already part of the CLARIN-D corpus infrastructure at the Institute for the German Language (IDS) in Mannheim. To allow for comparative analyses of the MoCoDa 2 data with the discourse found in text corpora and in other CMC corpora, MoCoDa 2 will not only be made available as a standalone resource but also be integrated into the German Reference Corpus (DeReKo) at the IDS Mannheim.

Files (187.8 kB)
Name Size
187.8 kB Download
All versions This version
Views 142142
Downloads 170170
Data volume 31.9 MB31.9 MB
Unique views 133133
Unique downloads 157157


Cite as