Published October 17, 2023
| Version v1
Dataset
Restricted
The TongueSwitcher Corpus of German-English Code-Switching
Creators
Description
This is the TongueSwitcher Corpus of German-English code-switching tweets. Included are the train and dev sets with automatic word (and subword for mixed words) language identification, alongside the human-annotated corpus with test and interlingual homograph sets.
BibTeX entry and citation info
@inproceedings{sterner2023tongueswitcher,
author = {Igor Sterner and Simone Teufel},
title = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
publisher = {Empirical Methods in Natural Language Processing},
year = {2023},
}