Published March 14, 2019 | Version v1
Dataset Open

All Your Script Are Belong to Us: Collecting and Analyzing JavaScript Code from 10K Sites for 9 Months

  • 1. Athens University of Economics and Business
  • 2. Greek Research and Technology Network

Description

We present a massive dataset (~2 TB) of client-side JavaScript code. Specifically, we have collected and stored on adaily basis JavaScript code from Alexa's Top 10000 web sites (~7.5 GB per day) for nine consecutive months. Our collection involved both inline scripts extracted from each web site's main page and external scripts linked from it. In order to aid researchers identify similar scripts and examine their popularity and evolution, we have produced hashes that represent the scripts' logical structure. Furthermore, we have analyzed the resulting dataset with well-established static analysis tools, generating additional metadata including reports with quality bugs and vulnerable libraries.

Files

defects-jshint.json

Files (61.3 GB)

Name Size Download all
md5:a342f46c1cd00d322d00db54995f096b
5.0 MB Preview Download
md5:fb9aa18838570dcf633f81f5b4c54b37
358.6 MB Preview Download
md5:0af19c80475a5397dbfe585f24d113c5
536.5 MB Download
md5:6e17d9020e0b4f0f41fd5007f9d3d2e3
60.4 GB Download