Published April 7, 2020 | Version v1
Dataset Open

Dataset used in the paper: "Scaling laws and dynamics of hashtags on Twitter"

  • 1. The University of Sydney
  • 2. The University of Auckland
  • 3. US Army Research Laboratory & the Rensselaer Polytechnic Institute

Description

This dataset was used in the manuscript "Scaling laws and dynamics of hashtags on Twitter"..

The Twitter data was obtained from a sample of 10% of all public tweets, provided by the Twitter streaming application programming interface. We extracted the hashtags from each tweet and counted how many times they were used in different time intervals. Time intervals of three different lengths were used: days, hours, and minutes.  The tweets were published between November 1st 2015 and November 30th 2016, but not all time intervals between these dates are available.

The four files in this dataset correspond each to one folder (collected using tar). Each folder contains compressed .csv files (compressed using gzip). The content of the .csv files in each folder are:

hashtags_frequency_day.tar
Counts of hashtags in each day. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

hashtags_frequency_hour.tar
Counts of hashtags in each hour. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

hashtags_frequency_minutes.tar
Counts of hashtags in each minute. The name of each file in the folder indicates the date (GMT, only a fraction of all days is available). The entries in each file are the hashtag and the count in the interval.

number_of_tweets.tar
Counts of the number of tweets in each minute. The name of each file in the folder indicates the day. The entries in each file are the minute in the day (GMT) and count of tweets in our dataset.

Files

Files (6.4 GB)

Name Size Download all
md5:ac2d359bc395db6a4dd843a03a313937
1.5 GB Download
md5:41280ba76e2069317ed5e035dcf6fcf6
4.0 GB Download
md5:a7ac22eaea103b2c04389273d952b58a
817.2 MB Download
md5:385e7a3545209c87c76fbc9440b3af15
111.0 MB Download