Published May 17, 2017 | Version v1
Dataset Open

URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists

  • 1. University of Sheffield


Data collector:

  • 1. L3S Research Center


The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise

  • a set of 989,529 tweet-URL pairs (tweets_2014_researcher.tsv.bz2) from 2014 from 6,271 users of the computer scientists sample in specified by time, tweet id, user id, and URL,
  • a set of 300,053,850 tweet ids (tweets_2014_sample.tsv.bz2) from the 1% Twitter stream sample from 2014,
  • a set of 605,080 tweet-URL pairs (tweets_2014_sample_6694_users.tsv.bz2) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL,
  • a set of the top 10,000 host names (MAG_hosts_10000.tsv) from the Microsoft Academic Graph data (, specified by rank, URL count, and host name, and
  • a set of 340 host names of URL shortening services (url_shortening_services.tsv).

In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included:

  • domains_by_odds_ratio.tsv.bz2 - a ranking of 61,860 domains,
  • hosts_by_odds_ratio.tsv.bz2 - a ranking of 80,384 hosts,
  • publisher_domains_by_odds_ratio.tsv.bz2 - a ranking of 924 publisher domains,
  • publisher_urls_by_odds_ratio.tsv.bz2 - a ranking of 4,227 publisher URLs.


This is an updated and extended version of 10.5281/zenodo.154583 where a new sample of users has been used, resulting in an updated file tweets_2014_sample_6694_users.tsv.bz2. In addition, domain, host, and URL rankings have been added.


Files (2.3 GB)

Name Size Download all
445.0 kB Download
619.7 kB Download
298.2 kB Download
8.1 kB Download
84.3 kB Download
32.0 MB Download
2.3 GB Download
12.2 MB Download
3.0 kB Download

Additional details

Related works

Is new version of
10.5281/zenodo.154583 (DOI)
Is supplement to
10.5281/zenodo.12942 (DOI)
10.1371/journal.pone.0179630 (DOI)