Dataset Open Access

URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists

Robert Jäschke


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Twitter</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">tweets</subfield>
  </datafield>
  <controlfield tag="005">20171102081527.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">This is an updated and extended version of 10.5281/zenodo.154583 where a new sample of users has been used, resulting in an updated file tweets_2014_sample_6694_users.tsv.bz2. In addition, domain, host, and URL rankings have been added.</subfield>
  </datafield>
  <controlfield tag="001">580587</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">L3S Research Center</subfield>
    <subfield code="4">col</subfield>
    <subfield code="a">Asmelash, Teka Hadgu</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">444954</subfield>
    <subfield code="z">md5:299e3ec2469d3a91582e592a2fc0aa1e</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/domains_by_odds_ratio.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">619682</subfield>
    <subfield code="z">md5:bd959f2b67bc50e746a4740d8969f18c</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/hosts_by_odds_ratio.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">298167</subfield>
    <subfield code="z">md5:bf92fe9d92a45949d44037a81356b82b</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/MAG_hosts_10000.tsv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">8120</subfield>
    <subfield code="z">md5:10e489478e9076e76d158c18e95f51bc</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/publisher_domains_by_odds_ratio.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">84262</subfield>
    <subfield code="z">md5:e5f563f85a2ea56fac3b20109e1c2402</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/publisher_urls_by_odds_ratio.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">31993560</subfield>
    <subfield code="z">md5:6c466537064b5a5574734f418893b199</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/tweets_2014_researcher.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">12227572</subfield>
    <subfield code="z">md5:2dff10a6301cb97c53a653a65019199c</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/tweets_2014_sample_6694_users.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2295230252</subfield>
    <subfield code="z">md5:d0ea5705cb86480a0f22a1c7439533b4</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/tweets_2014_sample.tsv.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2967</subfield>
    <subfield code="z">md5:1f040245142c7309b9c46f897f79f7ce</subfield>
    <subfield code="u">https://zenodo.org/record/580587/files/url_shortening_services.tsv</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2017-05-17</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:580587</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="4">
    <subfield code="v">12</subfield>
    <subfield code="p">PLoS ONE</subfield>
    <subfield code="n">6</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Sheffield</subfield>
    <subfield code="a">Robert Jäschke</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">URLs from tweets for a 2014 sample of Twitter users and for a set of computer scientists</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by-sa/4.0/</subfield>
    <subfield code="a">Creative Commons Attribution Share-Alike 4.0</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;a set of 989,529 tweet-URL pairs (&lt;em&gt;tweets_2014_researcher.tsv.bz2&lt;/em&gt;) from 2014 from 6,271 users of the computer scientists sample in https://zenodo.org/record/12942 specified by time, tweet id, user id, and URL,&lt;/li&gt;
	&lt;li&gt;a set of 300,053,850 tweet ids (&lt;em&gt;tweets_2014_sample.tsv.bz2&lt;/em&gt;) from the 1% Twitter stream sample from 2014,&lt;/li&gt;
	&lt;li&gt;a set of 605,080 tweet-URL pairs (&lt;em&gt;tweets_2014_sample_6694_users.tsv.bz2&lt;/em&gt;) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL,&lt;/li&gt;
	&lt;li&gt;a set of the top 10,000 host names (&lt;em&gt;MAG_hosts_10000.tsv&lt;/em&gt;) from the Microsoft Academic Graph data (http://blogs.msdn.com/b/msr_er/archive/2015/06/26/announcing-the-microsoft-academic-graph-let-the-research-begin.aspx), specified by rank, URL count, and host name, and&lt;/li&gt;
	&lt;li&gt;a set of 340 host names of URL shortening services (&lt;em&gt;url_shortening_services.tsv&lt;/em&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;em&gt;domains_by_odds_ratio.tsv.bz2&lt;/em&gt; - a ranking of 61,860 domains,&lt;/li&gt;
	&lt;li&gt;&lt;em&gt;hosts_by_odds_ratio.tsv.bz2&lt;/em&gt; - a ranking of 80,384 hosts,&lt;/li&gt;
	&lt;li&gt;&lt;em&gt;publisher_domains_by_odds_ratio.tsv.bz2&lt;/em&gt; - a ranking of 924 publisher domains,&lt;/li&gt;
	&lt;li&gt;&lt;em&gt;publisher_urls_by_odds_ratio.tsv.bz2&lt;/em&gt; - a ranking of 4,227 publisher URLs.&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isNewVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.154583</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a">10.5281/zenodo.12942</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a">10.1371/journal.pone.0179630</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.580587</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>

Share

Cite as