Published November 22, 2022 | Version v2
Dataset Open

MIGR-TWIT Corpus. Migration Tweets of right and far-right politics in Europe

  • 1. Università della Svizzera italiana; Université de Lille
  • 2. University of Wolverhampton
  • 3. Université de Tours
  • 4. Université de Lille

Contributors

Project member:

  • 1. Université de Lille

Description

Description

The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe. Within the framework of the collaborative research project OLiNDiNUM (Observatoire LINguistique du DIscours NUMérique, Linguistic Observatory of Online Debate) the MIGR-TWIT Corpus is created with the aim of developing language databases of online debate. Considering the global issue of migration in line with British and French political contexts of last dozen years from 2011 to 2022, the corpus consists of two sub-corpora: 

  • FR-R-MIGR-TWIT-2011-2022 Corpus for French language data (1 January 2011 - 30 June 2022) and 

  • UK-R-MIGR-RA-TWIT-2012-2022 Corpus for English language data (1 January 2012 - 5 September 2022)  

Using the Twitter API v2 Academic Research, tweets containing at least one occurrence of migration or refugee related words are retrieved automatically from 28 right and far-right political figures and parties. The whole corpus contains 18,233  tweets and 533,198 words. 

Scientific reference:

Pietrandrea, P., Battaglia, E. (2022). “Migrants and the EU”. The diachronic construction of ad hoc categories in French far-right discourse. Journal of Pragmatics 192, 139-157.

Blandino, G. (2023). 10 years of public debate on immigration: combining topic modeling and corpus linguistics to examine the British (far-)right discourse on Twitter, MA University of Wolverhampton

Jeon, S. (2025). Le discours numérique sur l'immigration en France entre 2011 et 2022. Une analyse de corpus (Online Discourse on Immigration in France between 2011 and 2022. A Corpus Analysis), PhD Thesis, Université de Lille, France.

Contents

The whole corpus contains two CSV Zip files (tabular format) corresponding to each sub-corpus. The complete corpus is presented in two versions, one version with the tweet identifier (data__id) and the text of the tweet (data__text) as a header (folders named FR-R-MIGR-TWIT-2011-2022_textonly and UK-R-MIGR-RA-TWIT-2012-2022_textonly, respectively composed of 12 and 11 Zip files of every single year), and the other version with all tweet fields information included as a header, such as the posting date (data__created__at), the username (author__name), the number of retweets (data__public_metrics__retweet_count), etc., with two folders named FR-R-MIGR-TWIT-2011-2022_meta and UK-R-MIGR-RA-TWIT-2012-2022_meta. Detailed information for each sub-corpus is illustrated below.

1. FR-R-MIGR-TWIT-2011-2022  

  • Created at: 2022-08-08
  • Language: FR 

  • Coverage: 16 user accounts; 11,761 tweets; 358,491 words

  • Time of data collection: start=2011-01-01; end=2022-06-30 

  • Keywords: words derived from a latin root “migr” of migrare

  • Corpus composition: 

  Political figure/party Username Tweets Year concerned
1 Michel Barnier @MichelBarnier 31 2017-22
2 Valérie Pécresse @vpecresse 81 2017-22
3 Rassemblement National @RNational_off 3,347 2017-22
4 Nicolas Dupont-aignan @dupontaignan 663 2011-22
5 Éric Ciotti @ECiotti 1,007 2012-22
6 Christian Estrosi @cestrosi 137 2011-22
7 Marine Le Pen @MLP_officiel 1,650 2011-22
8 Valérie Boyer @valerieboyer13 837 2012-22
9 Florian Philippot @f_philippot 485 2012-22
10 Xavier Bertrand @xavierbertrand 70 2017-22
11 Marion Maréchal @MarionMarechal 479 2012-17,19-22
12 Philippe Meunier @Meunier_Ph 245 2013-22
13 Jordan Bardella @J_Bardella 1,095 2013-22
14 Nicolas Bay @NicolasBay_ 1,260 2017-22
15 Emmanuel Macron @EmmanuelMacron 72 2017-22
16 Éric Zemmour @ZemmourEric 302 2019-22
17 Jean Messiha* Banned from Twitter (since July 2021) - -
  • Political figures and parties of table above are listed in chronological order according to the dates on which they posted their first tweet.
  • *Before the launching of Twitter API v2 Academic Research, migr-tweets were collected from the database of Europresse.com including 1,453 tweets of Jean Messiha as part of the reference study (Pietrandrea & Battaglia 2022). However, the Twitter account in question has been permanently banned since July 2021. For our data collection using the Twitter API started in September 2021, we could not access this account. Therefore, we decided not to include his tweets in the FR-R-MIGR-TWIT-2011-2022 for the sake of consistency with the rest of twitter data that are automatically retrieved.

  • The sub-corpus FR-R-MIGR-TWIT-2017-2022 is developed, annotated and analyzed as part of a doctoral thesis in progress (Jeon, 2025) with the aim of studying the semantic construction of migr-lexicon over the period between 2011 and 2022. 

 

2. UK-R-MIGR-RA-TWIT-2012-2022 

  • Created at: 2022-09-06

  • Language: EN

  • Coverage: 12 user accounts; 6,472 tweets; 174,707 words 

  • Time of data collection: start=2012-01-01; end=2022-09-05

  • Keywords: words derived from a latin root “migr” of migrare in addition to the keywords “refugee(s)” and “asylum”.

  • Corpus composition:

  Political figure/party Username Tweets Year concerned
1 David Cameron @David_Cameron 32 2012-22
2 Amber Rudd @AmberRuddUK 29 2012-22
3 Sajid Javid @sajidjavid 84 2012-22
4 Boris johnson @BorisJohnson 80 2015-22
5 Priti Patel @pritipatel 304 2012-22
6 UK Home Office @ukhomeoffice 909 2012-22
7 Nigel Farage @Nigel_Farage 1,010 2012-22
8 Richard Tice @TiceRichard 180 2013-22
9 UKIP @UKIP 2,746 2012-22
10 Neil Hamilton @NeilUKIP 252 2013-22
11 Nick Griffin @NickGriffinBU 542 2012-22
12 Robin Tilbrook @RobinTilbrook 304 2012-22

 

  • 2 out of 12 accounts are official accounts belonging to the” UK Home Office” department and the “UKIP” (United Kingdom Independence Party) party. 10 out of 12 accounts are political figures’ accounts.

  • The corpus UK-R-MIGR-RA-TWIT-2012-2022 will be exploited for the following master’s thesis: Blandino, G. (2023). 10 years of public debate on immigration: combining topic modeling and corpus linguistics to examine the British (far-)right discourse on Twitter, MA University of Wolverhampton.

 

Notes

Funding acknowledgements: - Université de Lille, Projet d'Internationalisation 2021 - Université Franco-italienne / Università Italo Francese - Campus France (Hubert Curien Partnerships): Italie - PHC Galilée 2018-19 ; Bay-Bas - PHC Van Gogh 2018-19

Files

FR-R-MIGR-TWIT-2011-2022_meta.csv

Files (36.1 MB)

Name Size Download all
md5:e6471950bc0af17063c8f5ecd2e33579
9.1 MB Preview Download
md5:d62d5d191c46debaa190978108830873
10.0 kB Preview Download
md5:00e257485e42d0c0b5a4b14043ca5bde
30.3 kB Preview Download
md5:7e38c43ad90d199550c21d877edc1e32
43.8 kB Preview Download
md5:6c06006ecf5dd5899928d2531b2d798e
100.6 kB Preview Download
md5:24f70984b1082783fb16131c0894b529
599.0 kB Preview Download
md5:27b106fe2b845c9d255c8b6a799e66cc
377.3 kB Preview Download
md5:b7b561fdaa5ea261eceaa8f0b1e39d43
1.1 MB Preview Download
md5:abed62f4114982eb6ff0aaa6d9074bce
2.3 MB Preview Download
md5:5cba530821ca6790deed0429a5484b77
1.6 MB Preview Download
md5:4b763287a0251c691d1d211d488ebdba
774.6 kB Preview Download
md5:5938fadc38b849ea4473a763422eb98f
1.2 MB Preview Download
md5:123c3b6cfa6cd391f7672ffe987500ec
1.0 MB Preview Download
md5:2eb3f8c2752e129f76e5feec316c225c
6.8 MB Preview Download
md5:70003c2f60a4ae5cceac1dad886bf2a6
5.5 MB Preview Download
md5:7d4235f60eaad00a4fa7aeb1ff6e5e8f
61.6 kB Preview Download
md5:14d6bb2aa218123a252fcb8ce47dccd8
210.1 kB Preview Download
md5:48ab65b304a9d796d9d5068004ae5ad2
261.8 kB Preview Download
md5:59f7424d33f213b004baf35ef545b1b2
609.0 kB Preview Download
md5:3890671141252c9508b0afb9525034fd
426.4 kB Preview Download
md5:62501f2bab584c45cd16d10421454c75
280.2 kB Preview Download
md5:350ae9706713d9060277ae22d08b1900
446.2 kB Preview Download
md5:4979991b72a64b057e2bd8d0896eb389
576.3 kB Preview Download
md5:56fa53986f4f9b030326b9f223fde803
857.5 kB Preview Download
md5:7a78bb3a1f4e3b9de271e0e440cc247e
1.1 MB Preview Download
md5:a554fe95e01c11ae225f45b4c0e528d7
697.1 kB Preview Download

Additional details

Related works

Continues
Dataset: 10.5281/zenodo.6302763 (DOI)

References

  • Pietrandrea, P., Battaglia, E. (2022). "Migrants and the EU". The diachronic construction of ad hoc categories in French far-right discourse. Journal of Pragmatics 192, 139-157.
  • Pietrandrea, P., Battaglia, E. (2019). La costruzione et l'introduzione implicita di entità categoriali ad hoc nel discorso sui migranti e sull'Europa. Workshop "Gli impliciti come mezzo di persuasione", LIII Congresso Internazionale della Società di Linguistica italiana, Università dell'Insubria, Como, 19–21 September 2019.
  • Battaglia, E., Jeon, S. (2021). Étudier le discours online de la droite française sur l'immigration: le corpus FrRMigr–Twit et ses applications. UMR 8163 STL, Université de Lille, 16 December 2021.
  • Blandino, G. (2023). 10 years of public debate on immigration: combining topic modeling and corpus linguistics to examine the British (far-)right discourse on Twitter, MA thesis University of Wolverhampton.
  • Jeon, S. (2025). Le discours numérique sur l'immigration en France entre 2011 et 2022. Une analyse de corpus (Online Discourse on Immigration in France between 2011 and 2022. A Corpus Analysis). PhD thesis, Université de Lille.