Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3836691", 
  "language": "eng", 
  "title": "Code Duplication and Reuse in Jupyter Notebooks", 
  "issued": {
    "date-parts": [
      [
        2020, 
        5, 
        29
      ]
    ]
  }, 
  "abstract": "<p>This is a replication package for the paper: &quot;Code Duplication and Reuse in Jupyter Notebooks&quot;, which was accepted as a full paper at the&nbsp;<strong>IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020</strong>.</p>\n\n<p>The contents of this package are as follows:</p>\n\n<ul>\n\t<li><strong>code</strong> folder: Contains all necessary code to reproduce the first study presented in the paper.</li>\n\t<li><strong>data</strong> folder: Contains all data pertaining to the first study presented in the paper.\n\t<ul>\n\t\t<li><strong>clones_1582405629.json.gz</strong> file: JSON database with all detected clones and its&nbsp;metadata for the used dataset.</li>\n\t\t<li><strong>commit_data_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit_data&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>commits_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>counter_1582422799.json.gz</strong> file: JSON database with statistics about all repositories in the used dataset.</li>\n\t\t<li><strong>notebooks_1589997765.pkl.gz</strong> file: Pandas pickle file containing&nbsp;the table &quot;notebooks&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>parameter_tunning</strong> folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>In order to fully reproduce the code, a fully functional <strong>Python 3.7</strong> environment is needed. The requirements can be found in the <strong>requirements.txt</strong> file. If the starting scripts are to be used, a <strong>Python 3.7.7</strong> version must be installed via <strong>pyenv</strong>, but is NOT&nbsp;necessary to run the notebooks,&nbsp;the <strong>JupyterLab</strong> environment can be launched manually issuing the&nbsp;command: <strong>&quot;jupyter lab notebooks&quot;</strong></p>\n\n<p>Commands:</p>\n\n<ol>\n\t<li>To install Python dependencies via Pip: <strong>&quot;pip install -r requirements.txt&quot;</strong></li>\n\t<li>To launch Jupyter: <strong>&quot;source start-jupyter.sh&quot;</strong></li>\n</ol>\n\n<p>Optional:</p>\n\n<ol>\n\t<li>To access environment variables from Jupyter, the file <strong>env_variables.py</strong> can be edited to add new variables or modify current ones.</li>\n</ol>\n\n<p><strong>SHA1SUM of ZIP file:</strong>&nbsp;c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513</p>", 
  "author": [
    {
      "family": "Koenzen, Andreas P."
    }, 
    {
      "family": "Ernst, Neil A."
    }, 
    {
      "family": "Storey, Margaret-Anne D."
    }
  ], 
  "id": "3836691", 
  "event-place": "Dunedin, New Zealand", 
  "version": "3.0", 
  "type": "paper-conference", 
  "event": "IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)"
}
83
10
views
downloads
All versions This version
Views 8383
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7676
Unique downloads 99

Share

Cite as