Published May 17, 2017
| Version v1
Dataset
Open
URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists
Description
The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise
- a set of 989,529 tweet-URL pairs (tweets_2014_researcher.tsv.bz2) from 2014 from 6,271 users of the computer scientists sample in https://zenodo.org/record/12942 specified by time, tweet id, user id, and URL,
- a set of 300,053,850 tweet ids (tweets_2014_sample.tsv.bz2) from the 1% Twitter stream sample from 2014,
- a set of 605,080 tweet-URL pairs (tweets_2014_sample_6694_users.tsv.bz2) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL,
- a set of the top 10,000 host names (MAG_hosts_10000.tsv) from the Microsoft Academic Graph data (http://blogs.msdn.com/b/msr_er/archive/2015/06/26/announcing-the-microsoft-academic-graph-let-the-research-begin.aspx), specified by rank, URL count, and host name, and
- a set of 340 host names of URL shortening services (url_shortening_services.tsv).
In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included:
- domains_by_odds_ratio.tsv.bz2 - a ranking of 61,860 domains,
- hosts_by_odds_ratio.tsv.bz2 - a ranking of 80,384 hosts,
- publisher_domains_by_odds_ratio.tsv.bz2 - a ranking of 924 publisher domains,
- publisher_urls_by_odds_ratio.tsv.bz2 - a ranking of 4,227 publisher URLs.
Notes
Files
Files
(2.3 GB)
Name | Size | Download all |
---|---|---|
md5:299e3ec2469d3a91582e592a2fc0aa1e
|
445.0 kB | Download |
md5:bd959f2b67bc50e746a4740d8969f18c
|
619.7 kB | Download |
md5:bf92fe9d92a45949d44037a81356b82b
|
298.2 kB | Download |
md5:10e489478e9076e76d158c18e95f51bc
|
8.1 kB | Download |
md5:e5f563f85a2ea56fac3b20109e1c2402
|
84.3 kB | Download |
md5:6c466537064b5a5574734f418893b199
|
32.0 MB | Download |
md5:d0ea5705cb86480a0f22a1c7439533b4
|
2.3 GB | Download |
md5:2dff10a6301cb97c53a653a65019199c
|
12.2 MB | Download |
md5:1f040245142c7309b9c46f897f79f7ce
|
3.0 kB | Download |
Additional details
Related works
- Is new version of
- 10.5281/zenodo.154583 (DOI)
- Is supplement to
- 10.5281/zenodo.12942 (DOI)
- 10.1371/journal.pone.0179630 (DOI)