Published December 24, 2021 | Version v1
Dataset Open

Datasets for the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells"

  • 1. JetBrains Research
  • 2. JetBrains Research, HSE University

Description

In this archive, you can find all the data used in the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells".

sklearn_full_cells.csv is the dataset from the paper of Pimentel et al. filtered with only Data Science notebooks.
complete.csv is the dataset obtained after the full run of ReSplit on the dataset: both merging and splitting.
split.csv is the dataset obtained after running only the splitting part of our dataset.
merged.csv is the dataset obtained after running only the merging part of our dataset.
duplicates_id.csv contains the IDs of the duplicate notebooks for deduplication.
changes.csv contains the IDs of the datasets, as well as their length before and after running ReSplit.
survey.csv is the table with the results of the survey.

In the dataset CSVs, each line is a cell that has a unique identifier and an identifier of the corresonding notebook.

Files

data.zip

Files (3.5 GB)

Name Size Download all
md5:667e0633e8a93beba485cee31f6cb9e7
3.5 GB Preview Download