Dataset Open Access

All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months

Dimitris Mitropoulos; Panos Louridas; Vitalis Salis; Diomidis Spinellis


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.2593266</identifier>
  <creators>
    <creator>
      <creatorName>Dimitris Mitropoulos</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-5061-9018</nameIdentifier>
      <affiliation>Athens University of Economics and Business</affiliation>
    </creator>
    <creator>
      <creatorName>Panos Louridas</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-3971-4612</nameIdentifier>
      <affiliation>Athens University of Economics and Business</affiliation>
    </creator>
    <creator>
      <creatorName>Vitalis Salis</creatorName>
      <affiliation>Greek Research and Technology Network</affiliation>
    </creator>
    <creator>
      <creatorName>Diomidis Spinellis</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-4231-1897</nameIdentifier>
      <affiliation>Athens University of Economics and Business</affiliation>
    </creator>
  </creators>
  <titles>
    <title>All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2019</publicationYear>
  <dates>
    <date dateType="Issued">2019-03-14</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/2593266</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.2593265</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="http://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;We present a massive dataset (~2 TB) of client-side JavaScript code. Specifically, we have collected and stored on adaily basis JavaScript code from Alexa&amp;#39;s Top 10000 web sites (~7.5 GB per day) for nine consecutive months. Our collection involved both inline scripts extracted from each web site&amp;#39;s main page and external scripts linked from it. In order to aid researchers identify similar scripts and examine their popularity and evolution, we have produced hashes that represent the scripts&amp;#39; logical structure. Furthermore, we have analyzed the resulting dataset with well-established static analysis tools, generating additional metadata including reports with quality bugs and vulnerable libraries.&lt;/p&gt;</description>
  </descriptions>
</resource>
82
10
views
downloads
All versions This version
Views 8282
Downloads 1010
Data volume 363.5 GB363.5 GB
Unique views 7373
Unique downloads 66

Share

Cite as