Dataset Open Access

All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months

Dimitris Mitropoulos; Panos Louridas; Vitalis Salis; Diomidis Spinellis


JSON-LD (schema.org) Export

{
  "description": "<p>We present a massive dataset (~2 TB) of client-side JavaScript code. Specifically, we have collected and stored on adaily basis JavaScript code from Alexa&#39;s Top 10000 web sites (~7.5 GB per day) for nine consecutive months. Our collection involved both inline scripts extracted from each web site&#39;s main page and external scripts linked from it. In order to aid researchers identify similar scripts and examine their popularity and evolution, we have produced hashes that represent the scripts&#39; logical structure. Furthermore, we have analyzed the resulting dataset with well-established static analysis tools, generating additional metadata including reports with quality bugs and vulnerable libraries.</p>", 
  "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Athens University of Economics and Business", 
      "@id": "https://orcid.org/0000-0002-5061-9018", 
      "@type": "Person", 
      "name": "Dimitris Mitropoulos"
    }, 
    {
      "affiliation": "Athens University of Economics and Business", 
      "@id": "https://orcid.org/0000-0002-3971-4612", 
      "@type": "Person", 
      "name": "Panos Louridas"
    }, 
    {
      "affiliation": "Greek Research and Technology Network", 
      "@type": "Person", 
      "name": "Vitalis Salis"
    }, 
    {
      "affiliation": "Athens University of Economics and Business", 
      "@id": "https://orcid.org/0000-0003-4231-1897", 
      "@type": "Person", 
      "name": "Diomidis Spinellis"
    }
  ], 
  "url": "https://zenodo.org/record/2593266", 
  "datePublished": "2019-03-14", 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/7274a32b-0fee-46af-878a-f25d9f59a2f3/defects-jshint.json", 
      "@type": "DataDownload", 
      "fileFormat": "json"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7274a32b-0fee-46af-878a-f25d9f59a2f3/defects-retire.json", 
      "@type": "DataDownload", 
      "fileFormat": "json"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7274a32b-0fee-46af-878a-f25d9f59a2f3/hashes.tar.gz", 
      "@type": "DataDownload", 
      "fileFormat": "gz"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7274a32b-0fee-46af-878a-f25d9f59a2f3/js_evolution_data.tar.gz", 
      "@type": "DataDownload", 
      "fileFormat": "gz"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.2593266", 
  "@id": "https://doi.org/10.5281/zenodo.2593266", 
  "@type": "Dataset", 
  "name": "All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months"
}
82
10
views
downloads
All versions This version
Views 8282
Downloads 1010
Data volume 363.5 GB363.5 GB
Unique views 7373
Unique downloads 66

Share

Cite as