Published December 18, 2025 | Version v2

FR-MIGR-TWIT Corpus 2.0

  • 1. ROR icon Université Lille Nord de France
  • 2. ROR icon Institut Universitaire de France

Description

The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe, developed within the framework of the OLiNDiNUM (Observatoire LINguistique du DIscours NUMérique, Linguistic Observatory of Online Debate), with the aim of documenting and analyzing online public discourse on (im)migration in contemporary European politics. Considering the global issue of migration over the last decade (2011–2022), and in order to observe discursive evolution accros the political spectrum and in two national contexts (France and the UK), the MIGR-TWIT Corpus was published: 

The FR-MIGR-TWIT Corpus 1.0, compiled from the FR-R and FR-L modules, comprises 17,395 tweets posted by 39 French political figures and parties (16 right-wing and 23 left-wing) between 2011 and 2022. Tweets containing migr- derivatives were retrieved via the Twitter API v2 Academic Research, and truncated retweets (>140 characters) were restored through targeted verification (for detailed information on each module see the links above).

This second version provides multilayer linguistic annotations of all occurrences of forms derived from the Latin root migr-. 

The FR-MIGR-TWIT Corpus 2.0 offers:

  • multilayer linguistic annotations associated with each occurrence of a migr- derivative (MIGR-LEXICON), including semantic roles (ROLE_SEM), syntactic functions (FUNC_SYN), lemmatised forms (LEMMA), as well as features and collocational items related to modification (MODIFICATION, LEMMA_MODIF_*, LEMMA_NOUN-1) and list/parallelism constructions (LIST_PAR, LENGTH-1, #forme#_MIGR-LIST_PAR) (Non-exhaustive list);
  • tweet URLs (tweet_url) and 44 types of data retrieved through the Full Archive Search endpoints of the Twitter API v2, such as the textual content of tweets (data__text), posting date (data__created_at), user ID (data__author_id), number of retweets (data__public_metrics__retweet_count) likes (data__public_metrics__like_count), replies (data__public_metrics__reply_count), quotes (data__public_metrics__quote_count). (Non-exhaustive list)
 
Changelog
version 2.0 (© 2025 Jeon & Pietrandrea)
– Added multilayer linguistic annotations
– Added TEI-XML format
– Added a basic Python query script
– Added README.md
 
Scientific reference:
 
Jeon, S. (2025). Le discours numérique sur l'immigration en France entre 2011 et 2022. Une analyse de corpus (Online Discourse on Immigration in France between 2011 and 2022. A Corpus Analysis), PhD Thesis, Université de Lille, France.
 
Jeon, S. (2025). “Constructing ‘Migration’ through Political Discourse: A Corpus Study of French Political Tweets (2011–2022)” In: FOM@Play: Migration, Identity and Transnational Discourses Congress, 2-5 September 2025, University of Granada, Spain.
 
Jeon, S. (2023). MigrTwit Corpora. (Im)migration Tweets of French Politics. In Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities, University of Mannheim; Leibniz-Institut für Deutsche Sprache (IDS).
 

Notes

The FR-MIGR-TWIT Corpus 2.0 is distributed under the Creative Commons CC-BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/). Its reuse is permitted for non-commercial purposes, including research and education.

Technical info

The corpus is available in CSV, XML, and TEI-XML formats. The CSV and XML files provide stand-off linguistic annotations and metadata. The TEI-XML files encode the canonical textual layer. 

Series information

The FR-MIGR-TWIT Corpus 1.0 (© 2025 Jeon, Battaglia & Pietrandrea) was previously released via Zenodo.

Files

FR-MIGR-TWIT_2.0.zip

Files (11.7 MB)

Name Size Download all
md5:b14595998e34134a7d3aa6cb4e0d567d
11.7 MB Preview Download

Additional details

Related works

Is compiled by
Dataset: 10.5281/zenodo.7347479 (DOI)
Dataset: 10.5281/zenodo.7871602 (DOI)
Is part of
Thesis: https://theses.fr/2025ULILH003 (URL)
Conference proceeding: 10.14618/1z5k-pb25 (DOI)
Presentation: https://hal.science/hal-05240728v1 (URL)

Funding

Institut Universitaire de France
Université Lille Nord de France
Projet d'Internalisation 2021
Campus France
Hubert Curien Partnerships PHC Galilée 2018-19
Campus France
Hubert Curien Partnerships PHC Van Gogh 2018-19