The TongueSwitcher Corpus of German-English Code-Switching

Sterner, Igor

doi:10.5281/zenodo.10011601

Published October 17, 2023 | Version v1

Dataset Restricted

The TongueSwitcher Corpus of German-English Code-Switching

Sterner, Igor

This is the TongueSwitcher Corpus of German-English code-switching tweets. Included are the train and dev sets with automatic word (and subword for mixed words) language identification, alongside the human-annotated corpus with test and interlingual homograph sets.

BibTeX entry and citation info

@inproceedings{sterner2023tongueswitcher,
author = {Igor Sterner and Simone Teufel},
title = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
publisher = {Empirical Methods in Natural Language Processing},
year = {2023},
}

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/10011601">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

If you would like to request access to these files, please fill out the form below. You need to satisfy these conditions in order for this request to be accepted: The dataset may be used for academic research only. Such research must follow the Twitter terms of usage and user privacy. Access is granted only for research or educational institution accounts (email address). The dataset may not be used commercially or for mass surveillance. Further exclusions of use may be decided based on the access request.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	342	342
Downloads	726	726
Data volume	91.7 GB	91.7 GB

The TongueSwitcher Corpus of German-English Code-Switching

Authors/Creators

Description

Files

Restricted

Request access