Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.

This is a replication package for the paper: "Code Duplication and Reuse in Jupyter Notebooks", which was accepted as a full paper at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020.

The contents of this package are as follows:

  • code folder: Contains all necessary code to reproduce the first study presented in the paper.
  • data folder: Contains all data pertaining to the first study presented in the paper.
    • clones_1582405629.json.gz file: JSON database with all detected clones and its metadata for the used dataset.
    • commit_data_1589997765.pkl.gz file: Pandas pickle file containing the table "commit_data" (See database.sql file).
    • commits_1589997765.pkl.gz file: Pandas pickle file containing the table "commit" (See database.sql file).
    • counter_1582422799.json.gz file: JSON database with statistics about all repositories in the used dataset.
    • notebooks_1589997765.pkl.gz file: Pandas pickle file containing the table "notebooks" (See database.sql file).
    • parameter_tunning folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.

In order to fully reproduce the code, a fully functional Python 3.7 environment is needed. The requirements can be found in the requirements.txt file. If the starting scripts are to be used, a Python 3.7.7 version must be installed via pyenv, but is NOT necessary to run the notebooks, the JupyterLab environment can be launched manually issuing the command: "jupyter lab notebooks"

Commands:

  1. To install Python dependencies via Pip: "pip install -r requirements.txt"
  2. To launch Jupyter: "source start-jupyter.sh"

Optional:

  1. To access environment variables from Jupyter, the file env_variables.py can be edited to add new variables or modify current ones.

SHA1SUM of ZIP file: c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513

Files (2.5 GB)
Name Size
VLHCC_2020_Paper_Reproducibility_Pkg.zip
md5:18cccb23601930f522ae345b83fe91bc
2.5 GB Download
80
10
views
downloads
All versions This version
Views 8080
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7373
Unique downloads 99

Share

Cite as