Published October 17, 2023 | Version v1
Dataset Restricted

The TongueSwitcher Corpus of German-English Code-Switching

Creators

Description

This is the TongueSwitcher Corpus of German-English code-switching tweets. Included are the train and dev sets with automatic word (and subword for mixed words) language identification, alongside the human-annotated corpus with test and interlingual homograph sets.

BibTeX entry and citation info

@inproceedings{sterner2023tongueswitcher,
   author     = {Igor Sterner and Simone Teufel},
   title          = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
   booktitle  = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
   publisher = {Empirical Methods in Natural Language Processing},
   year        = {2023},
}

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

If you would like to request access to these files, please fill out the form below. You need to satisfy these conditions in order for this request to be accepted: The dataset may be used for academic research only. Such research must follow the Twitter terms of usage and user privacy. Access is granted only for research or educational institution accounts (email address). The dataset may not be used commercially or for mass surveillance. Further exclusions of use may be decided based on the access request.

You are currently not logged in. Do you have an account? Log in here