Code Duplication and Reuse in Jupyter Notebooks

doi:10.5281/zenodo.3836691

Published May 29, 2020 | Version 3.0

Conference paper Open

Code Duplication and Reuse in Jupyter Notebooks

1. University of Victoria

This is a replication package for the paper: "Code Duplication and Reuse in Jupyter Notebooks", which was accepted as a full paper at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020.

The contents of this package are as follows:

code folder: Contains all necessary code to reproduce the first study presented in the paper.
data folder: Contains all data pertaining to the first study presented in the paper.
- clones_1582405629.json.gz file: JSON database with all detected clones and its metadata for the used dataset.
- commit_data_1589997765.pkl.gz file: Pandas pickle file containing the table "commit_data" (See database.sql file).
- commits_1589997765.pkl.gz file: Pandas pickle file containing the table "commit" (See database.sql file).
- counter_1582422799.json.gz file: JSON database with statistics about all repositories in the used dataset.
- notebooks_1589997765.pkl.gz file: Pandas pickle file containing the table "notebooks" (See database.sql file).
- parameter_tunning folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.

In order to fully reproduce the code, a fully functional Python 3.7 environment is needed. The requirements can be found in the requirements.txt file. If the starting scripts are to be used, a Python 3.7.7 version must be installed via pyenv, but is NOT necessary to run the notebooks, the JupyterLab environment can be launched manually issuing the command: "jupyter lab notebooks"

Commands:

To install Python dependencies via Pip: "pip install -r requirements.txt"
To launch Jupyter: "source start-jupyter.sh"

Optional:

To access environment variables from Jupyter, the file env_variables.py can be edited to add new variables or modify current ones.

SHA1SUM of ZIP file: c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513

Files

VLHCC_2020_Paper_Reproducibility_Pkg.zip

Files (2.5 GB)

Name	Size	Download all
VLHCC_2020_Paper_Reproducibility_Pkg.zip md5:18cccb23601930f522ae345b83fe91bc	2.5 GB	Preview Download

	All versions	This version
Views	243	242
Downloads	33	31
Data volume	93.2 GB	88.3 GB

Code Duplication and Reuse in Jupyter Notebooks

Creators

Description

Files

VLHCC_2020_Paper_Reproducibility_Pkg.zip

Files (2.5 GB)