Dataset Open Access

URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists

Robert Jäschke


JSON-LD (schema.org) Export

{
  "description": "<p>The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise</p>\n\n<ul>\n\t<li>a set of 989,529 tweet-URL pairs (<em>tweets_2014_researcher.tsv.bz2</em>) from 2014 from 6,271 users of the computer scientists sample in https://zenodo.org/record/12942 specified by time, tweet id, user id, and URL,</li>\n\t<li>a set of 300,053,850 tweet ids (<em>tweets_2014_sample.tsv.bz2</em>) from the 1% Twitter stream sample from 2014,</li>\n\t<li>a set of 605,080 tweet-URL pairs (<em>tweets_2014_sample_6694_users.tsv.bz2</em>) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL,</li>\n\t<li>a set of the top 10,000 host names (<em>MAG_hosts_10000.tsv</em>) from the Microsoft Academic Graph data (http://blogs.msdn.com/b/msr_er/archive/2015/06/26/announcing-the-microsoft-academic-graph-let-the-research-begin.aspx), specified by rank, URL count, and host name, and</li>\n\t<li>a set of 340 host names of URL shortening services (<em>url_shortening_services.tsv</em>).</li>\n</ul>\n\n<p>In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included:</p>\n\n<ul>\n\t<li><em>domains_by_odds_ratio.tsv.bz2</em> - a ranking of 61,860 domains,</li>\n\t<li><em>hosts_by_odds_ratio.tsv.bz2</em> - a ranking of 80,384 hosts,</li>\n\t<li><em>publisher_domains_by_odds_ratio.tsv.bz2</em> - a ranking of 924 publisher domains,</li>\n\t<li><em>publisher_urls_by_odds_ratio.tsv.bz2</em> - a ranking of 4,227 publisher URLs.</li>\n</ul>", 
  "license": "https://creativecommons.org/licenses/by-sa/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Sheffield", 
      "@type": "Person", 
      "name": "Robert J\u00e4schke"
    }
  ], 
  "url": "https://zenodo.org/record/580587", 
  "datePublished": "2017-05-17", 
  "keywords": [
    "Twitter", 
    "tweets"
  ], 
  "contributor": [
    {
      "affiliation": "L3S Research Center", 
      "@type": "Person", 
      "name": "Asmelash, Teka Hadgu"
    }
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/domains_by_odds_ratio.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/hosts_by_odds_ratio.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/MAG_hosts_10000.tsv", 
      "encodingFormat": "tsv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/publisher_domains_by_odds_ratio.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/publisher_urls_by_odds_ratio.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/tweets_2014_researcher.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/tweets_2014_sample_6694_users.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/tweets_2014_sample.tsv.bz2", 
      "encodingFormat": "bz2", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/1fdd3abb-448e-4b42-a9c8-627e58c7823c/url_shortening_services.tsv", 
      "encodingFormat": "tsv", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.580587", 
  "@id": "https://doi.org/10.5281/zenodo.580587", 
  "@type": "Dataset", 
  "name": "URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists"
}
757
95
views
downloads
All versions This version
Views 757759
Downloads 9595
Data volume 39.8 GB39.8 GB
Unique views 745747
Unique downloads 6767

Share

Cite as