brexit_timeline.csv The number of tweets per hour:
head brexit_timeline.csv 2016-05-06-11 2044 2016-05-06-12 2241 2016-05-06-13 2494 2016-05-06-14 2092 2016-05-06-15 2126 2016-05-06-16 2245 2016-05-06-17 2006 2016-05-06-18 1675 2016-05-06-19 1708 2016-05-06-20 1563
brexit_tweets_ids.csv.gz The tweet ids:
zcat brexit_tweets_ids.csv.gz | head # Try gzcat if zcat doesn't work. 728541290133127168 728541295304712192 728541297045344257 728541299649941504 728541301138984964 728541301013155840 728541314497712132 728541317094084608 728541326673907713 728541326673887234
You can get the data using twarc and poultry:
# Set up the credentials export CONSUMER_KEY=... export CONSUMER_SECRET=... export ACCESS_TOKEN=... export ACCESS_TOKEN_SECRET=... mkdir t # Hydrate the tweets using twarc and group them with poultry by day. time gzcat brexit_tweets_ids.csv.gz | twarc.py --hydrate - | poultry group -t 't/%Y-%m-%d.gz' t/2016-05-06.gz t/2016-05-07.gz ... # Half a million tweets are collected in about 4 hours.
The tweets were collected with Poultry using the infrastructure of the School of Electronic Engineering and Computer Science at Queen Mary University of London.
Poultry was created as a part of my master thesis where it was used to collect tweets about music festivals in Europe and the London Olympics. Later it resulted in a paper presented at the WWW 13 workshop RAMSS in Rio.