Conference paper Open Access

Code Duplication and Reuse in Jupyter Notebooks

Koenzen, Andreas P.; Ernst, Neil A.; Storey, Margaret-Anne D.


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Jupyter, computational notebooks, code duplication, code clones, code reuse, data analysis, data exploration, exploratory programming</subfield>
  </datafield>
  <controlfield tag="005">20200521082021.0</controlfield>
  <controlfield tag="001">3836691</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">10-14 August 2020</subfield>
    <subfield code="g">VL/HCC</subfield>
    <subfield code="a">IEEE Symposium on Visual Languages and Human-Centric Computing</subfield>
    <subfield code="c">Dunedin, New Zealand</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Ernst, Neil A.</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Storey, Margaret-Anne D.</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2453258443</subfield>
    <subfield code="z">md5:18cccb23601930f522ae345b83fe91bc</subfield>
    <subfield code="u">https://zenodo.org/record/3836691/files/VLHCC_2020_Paper_Reproducibility_Pkg.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u">https://conf.researchr.org/home/vlhcc2020</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-05-29</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="p">user-msr</subfield>
    <subfield code="o">oai:zenodo.org:3836691</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Victoria</subfield>
    <subfield code="a">Koenzen, Andreas P.</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Code Duplication and Reuse in Jupyter Notebooks</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-msr</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This is a replication package for the paper: &amp;quot;Code Duplication and Reuse in Jupyter Notebooks&amp;quot;, which was accepted as a full paper at the&amp;nbsp;&lt;strong&gt;IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The contents of this package are as follows:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;code&lt;/strong&gt; folder: Contains all necessary code to reproduce the first study presented in the paper.&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;data&lt;/strong&gt; folder: Contains all data pertaining to the first study presented in the paper.
	&lt;ul&gt;
		&lt;li&gt;&lt;strong&gt;clones_1582405629.json.gz&lt;/strong&gt; file: JSON database with all detected clones and its&amp;nbsp;metadata for the used dataset.&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;commit_data_1589997765.pkl.gz&lt;/strong&gt; file: Pandas pickle file containing the table &amp;quot;commit_data&amp;quot; (See &lt;em&gt;database.sql&lt;/em&gt; file).&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;commits_1589997765.pkl.gz&lt;/strong&gt; file: Pandas pickle file containing the table &amp;quot;commit&amp;quot; (See &lt;em&gt;database.sql&lt;/em&gt; file).&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;counter_1582422799.json.gz&lt;/strong&gt; file: JSON database with statistics about all repositories in the used dataset.&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;notebooks_1589997765.pkl.gz&lt;/strong&gt; file: Pandas pickle file containing&amp;nbsp;the table &amp;quot;notebooks&amp;quot; (See &lt;em&gt;database.sql&lt;/em&gt; file).&lt;/li&gt;
		&lt;li&gt;&lt;strong&gt;parameter_tunning&lt;/strong&gt; folder: Folder with the results of the parameter tuning phase. Each TXT file corresponds to a different threshold.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to fully reproduce the code, a fully functional &lt;strong&gt;Python 3.7&lt;/strong&gt; environment is needed. The requirements can be found in the &lt;strong&gt;requirements.txt&lt;/strong&gt; file. If the starting scripts are to be used, a &lt;strong&gt;Python 3.7.7&lt;/strong&gt; version must be installed via &lt;strong&gt;pyenv&lt;/strong&gt;, but is NOT&amp;nbsp;necessary to run the notebooks,&amp;nbsp;the &lt;strong&gt;JupyterLab&lt;/strong&gt; environment can be launched manually issuing the&amp;nbsp;command: &lt;strong&gt;&amp;quot;jupyter lab notebooks&amp;quot;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Commands:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;To install Python dependencies via Pip: &lt;strong&gt;&amp;quot;pip install -r requirements.txt&amp;quot;&lt;/strong&gt;&lt;/li&gt;
	&lt;li&gt;To launch Jupyter: &lt;strong&gt;&amp;quot;source start-jupyter.sh&amp;quot;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Optional:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;To access environment variables from Jupyter, the file &lt;strong&gt;env_variables.py&lt;/strong&gt; can be edited to add new variables or modify current ones.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SHA1SUM of ZIP file:&lt;/strong&gt;&amp;nbsp;c9b5d7e2dbe0574b73f2d2b67adb9e18fdcfb513&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3836690</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3836691</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
  </datafield>
</record>
83
10
views
downloads
All versions This version
Views 8383
Downloads 1010
Data volume 24.5 GB24.5 GB
Unique views 7676
Unique downloads 99

Share

Cite as