Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/89e80b5d-38c2-4ddd-8591-6d0924735966/VLHCC_2020_Paper_Reproducibility_Pkg.zip"
      }, 
      "checksum": "md5:18cccb23601930f522ae345b83fe91bc", 
      "bucket": "89e80b5d-38c2-4ddd-8591-6d0924735966", 
      "key": "VLHCC_2020_Paper_Reproducibility_Pkg.zip", 
      "type": "zip", 
      "size": 2453258443
    }
  ], 
  "owners": [
    93395
  ], 
  "doi": "10.5281/zenodo.3836691", 
  "stats": {
    "version_unique_downloads": 9.0, 
    "unique_views": 76.0, 
    "views": 83.0, 
    "version_views": 83.0, 
    "unique_downloads": 9.0, 
    "version_unique_views": 76.0, 
    "volume": 24532584430.0, 
    "version_downloads": 10.0, 
    "downloads": 10.0, 
    "version_volume": 24532584430.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3836691", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.3836690", 
    "bucket": "https://zenodo.org/api/files/89e80b5d-38c2-4ddd-8591-6d0924735966", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3836690.svg", 
    "html": "https://zenodo.org/record/3836691", 
    "latest_html": "https://zenodo.org/record/3836691", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3836691.svg", 
    "latest": "https://zenodo.org/api/records/3836691"
  }, 
  "conceptdoi": "10.5281/zenodo.3836690", 
  "created": "2020-05-20T21:44:01.238701+00:00", 
  "updated": "2020-05-21T08:20:21.839796+00:00", 
  "conceptrecid": "3836690", 
  "revision": 3, 
  "id": 3836691, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3836691", 
    "version": "3.0", 
    "language": "eng", 
    "title": "Code Duplication and Reuse in Jupyter Notebooks", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.3836690", 
        "relation": "isVersionOf"
      }
    ], 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3836690"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3836691"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "msr"
      }
    ], 
    "subjects": [], 
    "keywords": [
      "Jupyter, computational notebooks, code duplication, code clones, code reuse, data analysis, data exploration, exploratory programming"
    ], 
    "publication_date": "2020-05-29", 
    "creators": [
      {
        "affiliation": "University of Victoria", 
        "name": "Koenzen, Andreas P."
      }, 
      {
        "affiliation": "University of Victoria", 
        "name": "Ernst, Neil A."
      }, 
      {
        "affiliation": "University of Victoria", 
        "name": "Storey, Margaret-Anne D."
      }
    ], 
    "meeting": {
      "acronym": "VL/HCC", 
      "url": "https://conf.researchr.org/home/vlhcc2020", 
      "dates": "10-14 August 2020", 
      "place": "Dunedin, New Zealand", 
      "title": "IEEE Symposium on Visual Languages and Human-Centric Computing"
    }, 
    "access_right": "open", 
    "resource_type": {
      "subtype": "conferencepaper", 
      "type": "publication", 
      "title": "Conference paper"
    }, 
    "description": "<p>This is a replication package for the paper: &quot;Code Duplication and Reuse in Jupyter Notebooks&quot;, which was accepted as a full paper at the&nbsp;<strong>IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020</strong>.</p>\n\n<p>The contents of this package are as follows:</p>\n\n<ul>\n\t<li><strong>code</strong> folder: Contains all necessary code to reproduce the first study presented in the paper.</li>\n\t<li><strong>data</strong> folder: Contains all data pertaining to the first study presented in the paper.\n\t<ul>\n\t\t<li><strong>clones_1582405629.json.gz</strong> file: JSON database with all detected clones and its&nbsp;metadata for the used dataset.</li>\n\t\t<li><strong>commit_data_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit_data&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>commits_1589997765.pkl.gz</strong> file: Pandas pickle file containing the table &quot;commit&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>counter_1582422799.json.gz</strong> file: JSON database with statistics about all repositories in the used dataset.</li>\n\t\t<li><strong>notebooks_1589997765.pkl.gz</strong> file: Pandas pickle file containing&nbsp;the table &quot;notebooks&quot; (See <em>database.sql</em> file).</li>\n\t\t<li><strong>parameter_tunning</strong> folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>In order to fully reproduce the code, a fully functional <strong>Python 3.7</strong> environment is needed. The requirements can be found in the <strong>requirements.txt</strong> file. If the starting scripts are to be used, a <strong>Python 3.7.7</strong> version must be installed via <strong>pyenv</strong>, but is NOT&nbsp;necessary to run the notebooks,&nbsp;the <strong>JupyterLab</strong> environment can be launched manually issuing the&nbsp;command: <strong>&quot;jupyter lab notebooks&quot;</strong></p>\n\n<p>Commands:</p>\n\n<ol>\n\t<li>To install Python dependencies via Pip: <strong>&quot;pip install -r requirements.txt&quot;</strong></li>\n\t<li>To launch Jupyter: <strong>&quot;source start-jupyter.sh&quot;</strong></li>\n</ol>\n\n<p>Optional:</p>\n\n<ol>\n\t<li>To access environment variables from Jupyter, the file <strong>env_variables.py</strong> can be edited to add new variables or modify current ones.</li>\n</ol>\n\n<p><strong>SHA1SUM of ZIP file:</strong>&nbsp;c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513</p>"
  }
}
83
10
views
downloads
All versions This version
Views 8383
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7676
Unique downloads 99

Share

Cite as