There is a newer version of the record available.

Published March 25, 2022 | Version v1
Dataset Open

Dataset of Jupyter Notebooks from the paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts"

  • 1. JetBrains Research, ITMO University
  • 2. JetBrains Research

Description

This archive contains the dataset of properly-licensed Jupyter notebooks from the MSR'22 paper "A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts". The dataset contains 847,881 notebooks stored in the PostgreSQL dump file. You can find the details about the database in the README file.

To transform the notebooks into this convenient format and to calcuate the structural metrics, we used our library called Matroskin, which can be found here: https://github.com/JetBrains-Research/Matroskin. 

Files

README.md

Files (18.4 GB)

Name Size Download all
md5:d37786d344b38ad59d07e58d1070b7b4
17.4 GB Download
md5:ca8f5c62324a2c3ef6ccb4cd216ce6db
4.8 kB Preview Download
md5:3771674dcf34db47955cd6d0fb43e308
1.0 GB Preview Download