Published April 10, 2021 | Version v1
Conference paper Open

Evolution of Retweet Rates in Twitter User Careers: Analysis and Model

  • 1. MIT
  • 2. EPFL

Description

About this repository

The respository contains data and code from the paper: Evolution of Retweet Rates in Twitter User Careers: Analysis and Model, accepted at the International Conference on Web and Social Media 2021.

The repository contains 4 datasets.

The filenames start with the dataset name: {verified, political, despoina and despoina_random}. The filenames start with the dataset name

Each dataset has the following files:

## Data

1. $DATASET_NAME$_tweets_num_followers.txt.gz -- contains 4 columns (tab separated). twitter userid, tweetid, retweet_count, timestamp of tweeting, (estimated) number of followers at the time of tweeting

2. $DATASET_NAME$_follower_counts_wayback_archive.txt -- contains the raw follower count information obtained by scraping archive.org. Contains three columns (tab separated), (username, date of crawl on archive, number of followers)

3. $DATASET_NAME$_fit_functions.tar.gz -- a folder containing one pickle file per user. This pickle file is a polynomial function that was fit on the user's follower counts from archive. This function can be used to estimate the number of followers a user has at any point in time.

`user_func = pickle.load(open(dataset + "/fit_functions/" + user + ".pickle","rb"));`

`num_followers = user_func(timestamp); # for a given timestamp`

4. $DATASET_NAME$_userinfo.txt.gz -- contains the user profile information crawled in October 2018. The file is tab separated and contains the following columns:

user id, twitter screen_name, user name, profile location, profile description, followers_count, friends_count, statuses_count, profile created_at, is_protected, is_verified, language, is_geo_enabled, url, timezone, source, profile_image_url

This file can be used to obtain the user id and screen name mapping. (Note that a user id for a user is fixed, while the screen name can change over time).

## Code

5. getFollowerHistoryArchive.py -- Script to get archive data

run as `cat users.txt | python getFollowerHistoryArchive.py` (users.txt is a file containing twitter user screen names, one per line)

 

Abstract

We study the evolution of the number of retweets received by Twitter users over the course of their “careers” on the platform. We find that on average the number of retweets received by users tends to increase over time. This is partly expected because users tend to gradually accumulate followers. Normalizing by the number of followers, however, reveals that the relative, per-follower retweet rate tends to be non-monotonic, maximized at a “peak age” after which it does not increase, or even decreases. We develop a simple mathematical model of the process behind this phenomenon, which assumes a constantly growing number of followers, each of whom loses interest over time. We show that this model is sufficient to explain the non-monotonic nature of per-follower retweet rates, without any assumptions about the quality of content posted at different times

Files

despoina_follower_counts_wayback_archive.txt

Files (2.2 GB)

Name Size Download all
md5:c08cda9ceb9a6e7aabdef8d5bddd99d4
550.4 kB Download
md5:c10de35bb35dde820be4454a5466e036
2.8 MB Preview Download
md5:2408fa137623c132459a1efaeb12effe
598.6 kB Download
md5:6f41e111fa3d846ceb512859c9b492d4
3.2 MB Preview Download
md5:62f5ac26c3787129fae2d858090aba47
270.9 MB Download
md5:fabe96965c7933087aa7196123b23d85
103.0 MB Download
md5:1a1a7c6615ed1335d516ddb2a5aabb98
481.7 MB Download
md5:3df63391dc1434325a5e06b86c2f10d7
120.1 MB Download
md5:e3057cdcd168813517b98edd5e0a1063
5.3 kB Download
md5:a142eedadd1734c7dfa1047cc6231f79
2.1 MB Download
md5:b9968c3bbd294ae5a3a002abdf01afa3
12.0 MB Preview Download
md5:537e99cbde5659d1d5fc493faf2b252e
976.7 MB Download
md5:9b3e6691e33988a9542e4c1a6a4ad5bf
77.2 MB Download
md5:d45ad1b805768b0afad6644042a8ab34
2.0 kB Preview Download
md5:cb118eeaf462483be98d871e25ee2f0e
144.3 kB Download
md5:113663e7a6bb169f628a63f7c8a6c999
863.6 kB Preview Download
md5:888aedddde5f46df82c204530d1d3a1f
117.5 MB Download
md5:ac213ebe608c145ea4d5e394dcc66b15
500.0 kB Download