Software Open Access

Replication package for identify bot comments

Mehdi Golzadeh; Alexandre Decan; Eleni Constantinou; Tom Mens


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">GitHub, automated comments, distributed software development, classification model, empirical analysis</subfield>
  </datafield>
  <controlfield tag="005">20210305002722.0</controlfield>
  <controlfield tag="001">4580998</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0002-5824-5823</subfield>
    <subfield code="a">Alexandre Decan</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Eindhoven University of Technology</subfield>
    <subfield code="0">(orcid)0000-0002-4242-2581</subfield>
    <subfield code="a">Eleni Constantinou</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0003-3636-5020</subfield>
    <subfield code="a">Tom Mens</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">9256878</subfield>
    <subfield code="z">md5:b9b8dc427ad45b78f5cbf4e1b6c4e10d</subfield>
    <subfield code="u">https://zenodo.org/record/4580998/files/comments_classification_replicationpackage.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-05-22</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o">oai:zenodo.org:4580998</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0003-1041-439X</subfield>
    <subfield code="a">Mehdi Golzadeh</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Replication package for identify bot comments</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This repository contains the replication package for our study about identifying bots at the level of their activity in GitHub submitted to BotSE&amp;#39;21 conference (*&amp;quot;Identifying bot activity in GitHub pull request and issue comments&amp;quot;*).&lt;br&gt;
A link to the paper will be added to this README as soon as the paper is accepted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ground-truth dataset&lt;/strong&gt;&lt;br&gt;
The dataset is extracted from the ground-truth dataset of our study about [identifying bots](https://arxiv.org/abs/2010.03303) published in JSS journal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication package&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A- Dataset preparation.ipynb: This notebook splits the dataset to two disjoint set for training and test purposes. To avoid any conflict with GDPR regulations we&amp;#39;ve anonymised the account name columns.&lt;/p&gt;

&lt;p&gt;B- Model construction.ipynb: We followed a Grid-search cross validation in this notebook to find the best classifier and construct the final mode. The replication package was originally created on Python 3.8&amp;nbsp; and the dependencies required to run these notebooks are listed in requirements.txt and can be automatically installed using pip install -r requirements.txt.&lt;/p&gt;

&lt;p&gt;C- Model evaluation.ipynb: this notebook contains scripts to evaluate the classifier.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4580393</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4580998</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
  </datafield>
</record>
47
10
views
downloads
All versions This version
Views 4742
Downloads 1010
Data volume 92.6 MB92.6 MB
Unique views 4340
Unique downloads 1010

Share

Cite as