Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Koenzen, Andreas P.</dc:creator>
  <dc:creator>Ernst, Neil A.</dc:creator>
  <dc:creator>Storey, Margaret-Anne D.</dc:creator>
  <dc:date>2020-05-29</dc:date>
  <dc:description>This is a replication package for the paper: "Code Duplication and Reuse in Jupyter Notebooks", which was accepted as a full paper at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020.

The contents of this package are as follows:


	code folder: Contains all necessary code to reproduce the first study presented in the paper.
	data folder: Contains all data pertaining to the first study presented in the paper.
	
		clones_1582405629.json.gz file: JSON database with all detected clones and its metadata for the used dataset.
		commit_data_1589997765.pkl.gz file: Pandas pickle file containing the table "commit_data" (See database.sql file).
		commits_1589997765.pkl.gz file: Pandas pickle file containing the table "commit" (See database.sql file).
		counter_1582422799.json.gz file: JSON database with statistics about all repositories in the used dataset.
		notebooks_1589997765.pkl.gz file: Pandas pickle file containing the table "notebooks" (See database.sql file).
		parameter_tunning folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.
	
	


In order to fully reproduce the code, a fully functional Python 3.7 environment is needed. The requirements can be found in the requirements.txt file. If the starting scripts are to be used, a Python 3.7.7 version must be installed via pyenv, but is NOT necessary to run the notebooks, the JupyterLab environment can be launched manually issuing the command: "jupyter lab notebooks"

Commands:


	To install Python dependencies via Pip: "pip install -r requirements.txt"
	To launch Jupyter: "source start-jupyter.sh"


Optional:


	To access environment variables from Jupyter, the file env_variables.py can be edited to add new variables or modify current ones.


SHA1SUM of ZIP file: c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513</dc:description>
  <dc:identifier>https://zenodo.org/record/3836691</dc:identifier>
  <dc:identifier>10.5281/zenodo.3836691</dc:identifier>
  <dc:identifier>oai:zenodo.org:3836691</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>doi:10.5281/zenodo.3836690</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/msr</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>Jupyter, computational notebooks, code duplication, code clones, code reuse, data analysis, data exploration, exploratory programming</dc:subject>
  <dc:title>Code Duplication and Reuse in Jupyter Notebooks</dc:title>
  <dc:type>info:eu-repo/semantics/conferencePaper</dc:type>
  <dc:type>publication-conferencepaper</dc:type>
</oai_dc:dc>
83
10
views
downloads
All versions This version
Views 8383
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7676
Unique downloads 99

Share

Cite as