UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.
There is a newer version of this record available.

Dataset Open Access

Dataset of Jupyter Notebooks from the paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts"

Konstantin Grotov; Sergey Titov; Vladimir Sotnikov; Yaroslav Golubev; Timofey Bryksin

This archive contains the dataset of properly-licensed Jupyter notebooks from the MSR'22 paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts". The dataset contains 847,881 notebooks stored in the PostgreSQL dump file. You can find the details about the database in the README file.

To transform the notebooks into this convenient format and to calcuate the structural metrics, we used our library called Matroskin, which can be found here: https://github.com/JetBrains-Research/Matroskin. 
Files (18.4 GB)
Name Size
dataset.sql
md5:d37786d344b38ad59d07e58d1070b7b4
17.4 GB Download
README.md
md5:ca8f5c62324a2c3ef6ccb4cd216ce6db
4.8 kB Download
stylistic.csv
md5:3771674dcf34db47955cd6d0fb43e308
1.0 GB Download
866
193
views
downloads
All versions This version
Views 866748
Downloads 193112
Data volume 2.3 TB1.3 TB
Unique views 596523
Unique downloads 10863

Share

Cite as