Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>This is a replication package for the paper: &quot;Code Duplication and Reuse in Jupyter Notebooks&quot;, which was accepted as a full paper at the&nbsp;<strong>IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020</strong>.</p>\n\n<p>The contents of this package are as follows:</p>\n\n<ul>\n\t<li><strong>code</strong> folder: Contains all necessary code to reproduce the first study presented in the paper.</li>\n\t<li><strong>data</strong> folder: Contains all data pertaining to the first study presented in the paper.\n\t<ul>\n\t\t<li><strong>clones_1582405629.json.gz</strong> file: JSON database with all detected clones and its&nbsp;metadata for the used dataset.</li>\n\t\t<li><strong>commit_data_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit_data&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>commits_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>counter_1582422799.json.gz</strong> file: JSON database with statistics about all repositories in the used dataset.</li>\n\t\t<li><strong>notebooks_1589997765.pkl.gz</strong> file: Pandas pickle file containing&nbsp;the table &quot;notebooks&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>parameter_tunning</strong> folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>In order to fully reproduce the code, a fully functional <strong>Python 3.7</strong> environment is needed. The requirements can be found in the <strong>requirements.txt</strong> file. If the starting scripts are to be used, a <strong>Python 3.7.7</strong> version must be installed via <strong>pyenv</strong>, but is NOT&nbsp;necessary to run the notebooks,&nbsp;the <strong>JupyterLab</strong> environment can be launched manually issuing the&nbsp;command: <strong>&quot;jupyter lab notebooks&quot;</strong></p>\n\n<p>Commands:</p>\n\n<ol>\n\t<li>To install Python dependencies via Pip: <strong>&quot;pip install -r requirements.txt&quot;</strong></li>\n\t<li>To launch Jupyter: <strong>&quot;source start-jupyter.sh&quot;</strong></li>\n</ol>\n\n<p>Optional:</p>\n\n<ol>\n\t<li>To access environment variables from Jupyter, the file <strong>env_variables.py</strong> can be edited to add new variables or modify current ones.</li>\n</ol>\n\n<p><strong>SHA1SUM of ZIP file:</strong>&nbsp;c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Victoria", 
      "@type": "Person", 
      "name": "Koenzen, Andreas P."
    }, 
    {
      "affiliation": "University of Victoria", 
      "@type": "Person", 
      "name": "Ernst, Neil A."
    }, 
    {
      "affiliation": "University of Victoria", 
      "@type": "Person", 
      "name": "Storey, Margaret-Anne D."
    }
  ], 
  "headline": "Code Duplication and Reuse in Jupyter Notebooks", 
  "image": "https://zenodo.org/static/img/logos/zenodo-gradient-round.svg", 
  "datePublished": "2020-05-29", 
  "url": "https://zenodo.org/record/3836691", 
  "version": "3.0", 
  "@type": "ScholarlyArticle", 
  "keywords": [
    "Jupyter, computational notebooks, code duplication, code clones, code reuse, data analysis, data exploration, exploratory programming"
  ], 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.3836691", 
  "@id": "https://doi.org/10.5281/zenodo.3836691", 
  "workFeatured": {
    "url": "https://conf.researchr.org/home/vlhcc2020", 
    "alternateName": "VL/HCC", 
    "location": "Dunedin, New Zealand", 
    "@type": "Event", 
    "name": "IEEE Symposium on Visual Languages and Human-Centric Computing"
  }, 
  "name": "Code Duplication and Reuse in Jupyter Notebooks"
}
83
10
views
downloads
All versions This version
Views 8383
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7676
Unique downloads 99

Share

Cite as