Dataset Open Access
Konstantin Grotov;
Sergey Titov;
Vladimir Sotnikov;
Yaroslav Golubev;
Timofey Bryksin
This archive contains the dataset of properly-licensed Jupyter notebooks from the MSR'22 paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts". The dataset contains 847,881 notebooks stored in the PostgreSQL dump file. You can find the details about the database in the README file. To transform the notebooks into this convenient format and to calcuate the structural metrics, we used our library called Matroskin, which can be found here: https://github.com/JetBrains-Research/Matroskin.
Name | Size | |
---|---|---|
dataset.sql
md5:d37786d344b38ad59d07e58d1070b7b4 |
17.4 GB | Download |
README.md
md5:ca8f5c62324a2c3ef6ccb4cd216ce6db |
4.8 kB | Download |
stylistic.csv
md5:3771674dcf34db47955cd6d0fb43e308 |
1.0 GB | Download |
All versions | This version | |
---|---|---|
Views | 866 | 748 |
Downloads | 193 | 112 |
Data volume | 2.3 TB | 1.3 TB |
Unique views | 596 | 523 |
Unique downloads | 108 | 63 |