There is a newer version of this record available.

Dataset Open Access

Dataset of Jupyter Notebooks from the paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts"

Konstantin Grotov; Sergey Titov; Vladimir Sotnikov; Yaroslav Golubev; Timofey Bryksin

This archive contains the dataset of properly-licensed Jupyter notebooks from the MSR'22 paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts". The dataset contains 847,881 notebooks stored in the PostgreSQL dump file. You can find the details about the database in the README file.

To transform the notebooks into this convenient format and to calcuate the structural metrics, we used our library called Matroskin, which can be found here: https://github.com/JetBrains-Research/Matroskin. 
Files (18.4 GB)
Name Size
dataset.sql
md5:d37786d344b38ad59d07e58d1070b7b4
17.4 GB Download
README.md
md5:ca8f5c62324a2c3ef6ccb4cd216ce6db
4.8 kB Download
stylistic.csv
md5:3771674dcf34db47955cd6d0fb43e308
1.0 GB Download
469
42
views
downloads
All versions This version
Views 469459
Downloads 4240
Data volume 501.4 GB477.9 GB
Unique views 282274
Unique downloads 2221

Share

Cite as