Published March 25, 2022
| Version v1
Dataset
Open
Dataset of Jupyter Notebooks from the paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts"
- 1. JetBrains Research, ITMO University
- 2. JetBrains Research
Description
This archive contains the dataset of properly-licensed Jupyter notebooks from the MSR'22 paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts". The dataset contains 847,881 notebooks stored in the PostgreSQL dump file. You can find the details about the database in the README file. To transform the notebooks into this convenient format and to calcuate the structural metrics, we used our library called Matroskin, which can be found here: https://github.com/JetBrains-Research/Matroskin.