Published March 8, 2019 | Version v1
Dataset Open

Twitter cascade datasets

  • 1. IIT Kharagpur

Description

This repository contains a set of Twitter datasets containing metadata of tweets and retweets posted during specific events such as 2015 Nepal Earthquake, IPL 2018, 15-M movement in Spain and also tweets posted by celebrity such as Lady Gaga and her followers. The details of each dataset are as follows:

 

1. 2015 Nepal Earthquake: This folder contains the list of follower IDs of a user per line in "followers_network". The files "timeseries.txt" and "userseries.txt" contains the sorted timestamp of retweets and sorted sequence of retweeting users for a cascade, per line.

 

2. IPL 2018: This folder contains sequence of inter-retweet time intervals for every cascade per line in the file "cascade-intervals-IPL.txt".

 

3. 15-M: This folder contains a .csv and .txt file containing tweet metadata for a tweet per line in the following format:

idt;segs;hashtags;mentions

where idt - tweet ID, segs - segment number of the tweet, hashtags - set of hashtags separated by whitespace that were used in the tweet text, mentions - IDs of users mentioned in the tweet text

 

4. Lady Gaga: This folder contains metadata of tweets and retweets in the following format:

User_Name
Tweet_ID
Time
Via
retweet_from
reply_to_user    reply_to_tweet(if not reply, just "-1")
content
Number_of_link_in_tweet
type_of_link1    link1
type_of_link2    link2
type_of_link3    link3
...
 
 
Tweets were crawled for the users related to "Lady Gaga”, and randomly collected 10,000 of her followers from Jan1, 2010 to Oct, 2010 and from Oct 1, 2010 to Jan 15, 2010.

     

Files

15m.zip

Files (915.9 MB)

Name Size Download all
md5:a63f87b98edc8a58103be7b9399f4c72
9.6 MB Preview Download
md5:7728176c19f93afae31df033810d4d07
361.9 kB Preview Download
md5:1ba66bc0aae745406839110ce92f7ccd
821.8 MB Preview Download
md5:9347a039fc7b1389476c0a73015c23f9
84.2 MB Preview Download