#brexit tweets collected in May and June 2016

Files

brexit_timeline.csv The number of tweets per hour:

head brexit_timeline.csv
2016-05-06-11 2044
2016-05-06-12 2241
2016-05-06-13 2494
2016-05-06-14 2092
2016-05-06-15 2126
2016-05-06-16 2245
2016-05-06-17 2006
2016-05-06-18 1675
2016-05-06-19 1708
2016-05-06-20 1563

brexit_tweets_ids.csv.gz The tweet ids:

zcat brexit_tweets_ids.csv.gz | head  # Try gzcat if zcat doesn't work.
728541290133127168
728541295304712192
728541297045344257
728541299649941504
728541301138984964
728541301013155840
728541314497712132
728541317094084608
728541326673907713
728541326673887234

Collecting the data

You can get the data using twarc and poultry:

# Set up the credentials
export CONSUMER_KEY=...
export CONSUMER_SECRET=...
export ACCESS_TOKEN=...
export ACCESS_TOKEN_SECRET=...

mkdir t

# Hydrate the tweets using twarc and group them with poultry by day.
time gzcat brexit_tweets_ids.csv.gz | twarc.py --hydrate - | poultry group -t 't/%Y-%m-%d.gz'
t/2016-05-06.gz
t/2016-05-07.gz
...


# Half a million tweets are collected in about 4 hours.

Software

twarc and poultry

Attribution

The tweets were collected with Poultry using the infrastructure of the School of Electronic Engineering and Computer Science at Queen Mary University of London.

Poultry was created as a part of my master thesis where it was used to collect tweets about music festivals in Europe and the London Olympics. Later it resulted in a paper presented at the WWW 13 workshop RAMSS in Rio.